最新公告
  • 欢迎您光临IO源码网,本站秉承服务宗旨 履行“站长”责任,销售只是起点 服务永无止境!立即加入我们
  • Clustering by fast search and find of PDF 下载

    Clustering by fast search and find of PDF 下载

    本站整理下载:
    链接:https://pan.baidu.com/s/1FT1ErrYcskfg3dyGduLA9Q 
    提取码:lqic 
     
     
    相关截图:
     
    主要内容:

    Cluster analysis is aimed at classifying elements into categories on the basis of their
    similarity. Its applications range from astronomy to bioinformatics, bibliometrics, and pattern
    recognition. We propose an approach based on the idea that cluster centers are characterized
    by a higher density than their neighbors and by a relatively large distance from points with
    higher densities. This idea forms the basis of a clustering procedure in which the number of
    clusters arises intuitively, outliers are automatically spotted and excluded from the analysis, and
    clusters are recognized regardless of their shape and of the dimensionality of the space in which
    they are embedded. We demonstrate the power of the algorithm on several test cases.
    C
    lustering algorithms attempt to classify
    elements into categories, or clusters, on
    the basis of their similarity. Several different clustering strategies have been proposed (1), but no consensus has been reached
    even on the definition of a cluster. In K-means (2)
    and K-medoids (3) methods, clusters are groups
    of data characterized by a small distance to the
    cluster center. An objective function, typically the
    sum of the distance to a set of putative cluster
    centers, is optimized (3–6) until the best cluster
    centers candidates are found. However, because
    a data point is always assigned to the nearest
    center, these approaches are not able to detect
    nonspherical clusters (7). In distribution-based algorithms, one attempts to reproduce the observed
    realization of data points as a mix of predefined
    probability distribution functions (8); the accuracy
    of such methods depends on the capability of the
    trial probability to represent the data.
    Clusters with an arbitrary shape are easily
    detected by approaches based on the local density of data points. In density-based spatial clustering of applications with noise (DBSCAN) (9),
    one chooses a density threshold, discards as noise
    the points in regions with densities lower than
    this threshold, and assigns to different clusters
    disconnected regions of high density. However,
    choosing an appropriate threshold can be nontrivial, a drawback not present in the mean-shift
    clustering method (10, 11). There a cluster is defined as a set of points that converge to the same
    local maximum of the density distribution function. This method allows the finding of nonspherical clusters but works only for data defined by a
    set of coordinates and is computationally costly.
    Here, we propose an alternative approach.
    Similar to the K-medoids method, it has its
    basis only in the distance between data points.
    Like DBSCAN and the mean-shift method, it is
    able to detect nonspherical clusters and to automatically find the correct number of clusters.
    The cluster centers are defined, as in the meanshift method, as local maxima in the density of
    data points. However, unlike the mean-shift method, our procedure does not require embedding
    the data in a vector space and maximizing explicitly the density field for each data point.
    The algorithm has its basis in the assumptions
    that cluster centers are surrounded by neighbors
    with lower local density and that they are at a
    relatively large distance from any points with a
    higher local density. For each data point i, we
    compute two quantities: its local density ri and
    its distance di from points of higher density. Both
    these quantities depend only on the distances dij
    between data points, which are assumed to satisfy the triangular inequality. The local density ri
    of data point i is defined as
    ri ¼ ∑j cðdij − dcÞ ð1Þ
    where cðxÞ ¼ 1 if x < 0 and cðxÞ ¼ 0 otherwise,
    and dc is a cutoff distance. Basically, ri is equal to
    the number of points that are closer than dc to
    point i. The algorithm is sensitive only to the relative magnitude of ri in different points, implying
    that, for large data sets, the results of the analysis
    are robust with respect to the choice of dc.
    1492 27 JUNE 2014 • VOL 344 ISSUE 6191 sciencemag.org SCIENCE
    RESEARCH | REPORTS
    SISSA (Scuola Internazionale Superiore di Studi Avanzati),
    via Bonomea 265, I-34136 Trieste, Italy.
    E-mail: [email protected] (A.L.); [email protected] (A.R.)

     

    *** 次数:10600 已用完,请联系开发者***

    1. 本站所有资源来源于用户上传和网络,因此不包含技术服务请大家谅解!如有侵权请邮件联系客服!384324621@qq.com
    2. 本站不保证所提供下载的资源的准确性、安全性和完整性,资源仅供下载学习之用!如有链接无法下载、失效或广告,请联系客服处理,有奖励!
    3. 您必须在下载后的24个小时之内,从您的电脑中彻底删除上述内容资源!如用于商业或者非法用途,与本站无关,一切后果请用户自负!
    4. 如果您也有好的资源或教程,您可以投稿发布,成功分享后有★币奖励和额外收入!

    IO 源码网 » Clustering by fast search and find of PDF 下载

    常见问题FAQ

    免费下载或者VIP会员专享资源能否直接商用?
    本站所有资源版权均属于原作者所有,这里所提供资源均只能用于参考学习用,请勿直接商用。若由于商用引起版权纠纷,一切责任均由使用者承担。更多说明请参考 VIP介绍。
    提示下载完但解压或打开不了?
    最常见的情况是下载不完整: 可对比下载完压缩包的与网盘上的容量,若小于网盘提示的容量则是这个原因。这是浏览器下载的bug,建议用百度网盘软件或迅雷下载。若排除这种情况,可在对应资源底部留言,或 联络我们.。
    找不到素材资源介绍文章里的示例图片?
    对于PPT,KEY,Mockups,APP,网页模版等类型的素材,文章内用于介绍的图片通常并不包含在对应可供下载素材包内。这些相关商业图片需另外购买,且本站不负责(也没有办法)找到出处。 同样地一些字体文件也是这种情况,但部分素材会在素材包内有一份字体下载链接清单。
    IO源码吧
    一个高级程序员模板开发平台

    发表评论

    • 99会员总数(位)
    • 11099资源总数(个)
    • 79本周发布(个)
    • 15 今日发布(个)
    • 446稳定运行(天)

    提供最优质的资源集合

    立即查看 了解详情