助手标题  
全文文献 工具书 数字 学术定义 翻译助手 学术趋势 更多
查询帮助
意见反馈
   文档频率 的翻译结果: 查询用时:0.03秒
图标索引 在分类学科中查询
所有学科
计算机软件及计算机应用
更多类别查询

图标索引 历史查询
 

文档频率
相关语句
  document frequency
     Four methods were evaluated, including document frequency (DF), information gain (IG), mutual information (MI) and χ 2 test (CHI).
     考察了文档频率DF、信息增益IG、互信息MI、χ2分布CHI四种不同的特征选取方法。
短句来源
     Four different feature selection methods are discussed,including Document Frequency(DF),Mutual Information(MI),X2 test(CHI),Correlation Coefficient(CC),and the correction of text categorization is compared using the algorithm of K nearest neighbor.
     考察了文档频率DF、互信息MI、CHI统计、CC统计四种不同的特征选择方法,并结合K近邻算法进行分类精度上的比较.
短句来源
     They are document frequency (DF) based, x~2 -test (CHI) based, mutual information (MI), based and information gain (IG) based.
     在特征提取部分,结合了基于文档频率(DF)、x~2分布(CHI)、信息增益(IG)以及互信息(MI)等几种不同的特征选择方法,通过实验结果的比较,证明在本文的系统中基于IG的特征选择方法要优于其他方法。
短句来源
     By using a weighting algorithm combined with the traditional IDF (Inverse document frequency) and a new distribution information method, and by presenting a new concept-LFHW (Low frequency but High Weight terms), this system gives a new approach to term weighting.
     本系统采用了传统的反文档频率算法与特征的类别分布信息相结合的新算法(特征项在类间及类内的分布信息)对特征项作加权处理。 针对有些频率很低但重要度却很高的特征项的特点,本系统提出了低频高权特征的概念,并为每个类别建立低频高权特征集。
短句来源
     This paper presents a study of seven feature selection methods that are commonly used in text categorization: document frequency, information gain, mutual information, X\+2 statistic, expected cross entropy, weight of evidence for text, and odds ratio.
     比较了文档频率、信息增益、互信息、X2统计量、期望交叉熵、文本证据权以及几率比等7种常用于文本分类的特征选择算法。
短句来源
更多       
  “文档频率”译为未确定词的双语例句
     ConclusionThe method of similarity measurement in group4 (collected papers plus TF - IDF) is the most effectiveness in the computation of similarity, it can realize document clustering and ranking effectively.
     采用词频法(TF法)和词频-逆文档频率法(TF-IDF法)计算索引词权重。 两两配对形成四种相关度算法,组1:样本课题检出文献为总体+TF法;
短句来源
     studies the feature extraction methods: DF, IG, CHI,TS, MI. finds that the extraction efficiencyof them is: CHI>MI>DF>TS>IG, uses many of them logically to gain the better result.
     对常用的特征提取方法文档频率(DF)、信息收益(IG)、互信息(MI)、开方拟合检验(CHI)、术语强度(TS),进行了研究,指出其提取效率的高低为CHI>MI>DF>TS>IG,并阐明综合运用多种特征提取方法才能得到较好的提取效果。
短句来源
     The weights of features are calculated by formula, and then the E-mail corpus is presented in vector space model which can be processed by the computer.
     使用词频反文档频率计算选择的特征词的权重,通过算法把标准邮件集处理成了支持向量机算法可以直接处理的向量空间模型的形式。
短句来源
     The method of feature extraction based on DF has low computation complexity, and it has equivalent performance with the methods of DF and IG The method of DF is suitable for massive text classification task.
     基于文档频率的特征提取方法计算复杂度低,而且提取出的特征的分类效果也和卡方检验以及信息增益等方法相当,比较适合于大规模的文本分类任务。
短句来源
     The categorization methods based EPs have been applied in many fields successfully, such as DNA analysis, but we don't see the reports about applying categorization methods based EPs to text automatic categorization.
     本文在分析了文档频率特征提取方法存在的问题的基础上,给出了特征的区分能力的概念,提出了一种基于区分能力的特征提取方法,并将基于eEPs的分类方法应用到中文文本分类领域,提出并实现了一个基于eEPs的中文文本分类算法TCEP。
短句来源
  相似匹配句对
     FREQUENCY
     生死频率
短句来源
     ON FREQUENCIES
     频率
短句来源
     Storage of XML Data Based on Traversal Frequency
     一种基于访问频率的XML文档存储方法
短句来源
     A Survey of Document Clustering
     文档聚类综述
短句来源
     An Integral Document Management Information System
     文档一体化综合管理信息系统
短句来源
查询“文档频率”译词为用户自定义的双语例句

    我想查看译文中含有:的双语例句
例句
为了更好的帮助您理解掌握查询词或其译词在地道英语中的实际用法,我们为您准备了出自英文原文的大量英语例句,供您参考。
  document frequency
We study the performance for varying numbers of features, along with three different feature selection methods: information gain (IG), document frequency (DF) and odds ratio.
      
In this paper, we describe a new feature selection measure, class-wise document frequency of byte n-grams.
      
A document frequency of five or greater is required for the candidate year.
      
Alternatively, the probability values can be computed within L1 from the available occurrence data, for example their inverse document frequency.
      
Any word which has a document frequency above or below certain pre-decided threshold is then removed.
      
更多          


Because image database is very huge, the feature extraction and retrieval process are usually time consuming. In order to effectively use existing text information retrieval methods in content based image retrieval, especially the index mechanism of the product tf * idf by term frequency (tf) and inverse document frequency (idf) for each text document, this paper cooperates tf * idf model with triangle tree to improve the retrieval performance. First, after pixel-based histogram features of sub-block in certain...

Because image database is very huge, the feature extraction and retrieval process are usually time consuming. In order to effectively use existing text information retrieval methods in content based image retrieval, especially the index mechanism of the product tf * idf by term frequency (tf) and inverse document frequency (idf) for each text document, this paper cooperates tf * idf model with triangle tree to improve the retrieval performance. First, after pixel-based histogram features of sub-block in certain image class are mapped to color concept space through independent component analysis (ICA), we would obtain all of independent keyblock of such image class; then well-trained fuzzy support vector machine is used to recognize all of independent keyblocks contained by each image. Similar to text retrieval, in which the whole text document is indexed by , the recognized independent keyblock is used to index each image in database. Because independent component features are naturally high order independent with each other, compared to principle component analysis (PCA) method, this algorithm achieves higher performance. At last, triangle tree is used to hierachically index image database and thereof speed up retrieval.

由于图象存储数据量非常大,因此提取图象特征和检索极为耗时.为了提高图象检索效率,将文本检索中的有效检索方法(基于关键字频率与关键字逆文档频率乘积的索引模型)结合三角树索引机制应用到基于内容的图象检索,提出了一种基于独立关键子块和三角树的快速图象检索新方法.该方法首先用独立分量分析将样本图象子块中的直方图特征映射到色彩概念空间来得到类似于文本中关键字的独立关键子块;然后再用训练好的模糊支持向量机去识别每幅图象中所包含的独立关键子块,由于独立分量分析能够使特征彼此保持高阶独立性,因此该方法与主成分分析方法对比,具有较高检索效率;最后,再通过构造三角树来来为图象数据库建立分层索引结构,以加快检索速度.

This paper is a comparative study of feature selection methods in text categorization. Four methods were evaluated, including document frequency (DF), information gain (IG), mutual information (MI) and χ 2 test (CHI). A Support Vector Machine ( SVM) and a k nearest neighbor (KNN) were selected as the evaluating classifiers. We found IG, MI and CHI had poor performance in our test, though they behave well in English text categorization. We analyzed the reasons theoretically and put forwarded the possible solutions....

This paper is a comparative study of feature selection methods in text categorization. Four methods were evaluated, including document frequency (DF), information gain (IG), mutual information (MI) and χ 2 test (CHI). A Support Vector Machine ( SVM) and a k nearest neighbor (KNN) were selected as the evaluating classifiers. We found IG, MI and CHI had poor performance in our test, though they behave well in English text categorization. We analyzed the reasons theoretically and put forwarded the possible solutions. A furthermore experiment proved that the combined feature selection method is effective.

本文比较研究了在中文文本分类中特征选取方法对分类效果的影响。考察了文档频率DF、信息增益IG、互信息MI、χ2分布CHI四种不同的特征选取方法。采用支持向量机 (SVM )和KNN两种不同的分类器以考察不同抽取方法的有效性。实验结果表明 ,在英文文本分类中表现良好的特征抽取方法 (IG、MI和CHI)在不加修正的情况下并不适合中文文本分类。文中从理论上分析了产生差异的原因 ,并分析了可能的矫正方法包括采用超大规模训练语料和采用组合的特征抽取方法。最后通过实验验证组合特征抽取方法的有效性。

Topic detection technique can be used to detect news topics from a great amount of news stream, and help us organize and utilize news information according to their topics. In this paper, a Chinese news topic detection system is designed and tested by using the improved time window strategy and adopting the self-adaptive inverse document frequency. It is then indicated that the proposed system is effective. The application of the system in the data center of XinHua Net further illustrates that the system meet...

Topic detection technique can be used to detect news topics from a great amount of news stream, and help us organize and utilize news information according to their topics. In this paper, a Chinese news topic detection system is designed and tested by using the improved time window strategy and adopting the self-adaptive inverse document frequency. It is then indicated that the proposed system is effective. The application of the system in the data center of XinHua Net further illustrates that the system meet the application demands well.

利用主题检测技术可以从海量新闻信息中实时检测到主题信息,从而将新闻信息按照主题组织并加以利用.文中通过改进加窗策略,采用自适应倒排文档频率,设计了一个中文新闻主题检测系统并进行了实验.结果表明了该系统的有效性.该系统在新华网数据中心的成功应用进一步表明系统达到了实用需求.

 
<< 更多相关文摘    
图标索引 相关查询

 


 
CNKI小工具
在英文学术搜索中查有关文档频率的内容
在知识搜索中查有关文档频率的内容
在数字搜索中查有关文档频率的内容
在概念知识元中查有关文档频率的内容
在学术趋势中查有关文档频率的内容
 
 

CNKI主页设CNKI翻译助手为主页 | 收藏CNKI翻译助手 | 广告服务 | 英文学术搜索
版权图标  2008 CNKI-中国知网
京ICP证040431号 互联网出版许可证 新出网证(京)字008号
北京市公安局海淀分局 备案号:110 1081725
版权图标 2008中国知网(cnki) 中国学术期刊(光盘版)电子杂志社