最新公告
  • 欢迎您光临IO源码网,本站秉承服务宗旨 履行“站长”责任,销售只是起点 服务永无止境!立即加入我们
  • Distributed Representations of Sentences and Documents PDF 下载

    失效链接处理 Distributed Representations of Sentences and Documents  PDF 下载


    本站整理下载:
    链接:https://pan.baidu.com/s/1D39iUhpLFLSqsQUBD9gnTQ 
    提取码:a219 
    相关截图:
    主要内容:

    1. Introduction
    Text classification and clustering play an important role
    in many applications, e.g, document retrieval, web search,
    spam filtering. At the heart of these applications is ma-
    chine learning algorithms such as logistic regression or K-
    means. These algorithms typically require the text input to
    be represented as a fixed-length vector. Perhaps the most
    common fixed-length vector representation for texts is the
    bag-of-words or bag-of-n-grams (Harris, 1954) due to its
    simplicity, efficiency and often surprising accuracy.
    However, the bag-of-words (BOW) has many disadvan-
    Proceedings of the 31 st International Conference on Machine
    Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy-
    right 2014 by the author(s).
    tages. The word order is lost, and thus different sentences
    can have exactly the same representation, as long as the
    same words are used. Even though bag-of-n-grams con-
    siders the word order in short context, it suffers from data
    sparsity and high dimensionality. Bag-of-words and bag-
    of-n-grams have very little sense about the semantics of the
    words or more formally the distances between the words.
    This means that words “powerful,” “strong” and “Paris” are
    equally distant despite the fact that semantically, “power-
    ful” should be closer to “strong” than “Paris.”
    In this paper, we propose Paragraph Vector, an unsuper-
    vised framework that learns continuous distributed vector
    representations for pieces of texts. The texts can be of
    variable-length, ranging from sentences to documents. The
    name Paragraph Vector is to emphasize the fact that the
    method can be applied to variable-length pieces of texts,
    anything from a phrase or sentence to a large document.
    In our model, the vector representation is trained to be use-
    ful for predicting words in a paragraph. More precisely, we
    concatenate the paragraph vector with several word vec-
    tors from a paragraph and predict the following word in the
    given context. Both word vectors and paragraph vectors are
    trainedbythestochasticgradientdescentandbackpropaga-
    tion (Rumelhart et al., 1986). While paragraph vectors are
    unique among paragraphs, the word vectors are shared. At
    prediction time, the paragraph vectors are inferred by fix-
    ing the word vectors and training the new paragraph vector
    until convergence.
    Our technique is inspired by the recent work in learn-
    ing vector representations of words using neural net-
    works (Bengio et al., 2006; Collobert & Weston, 2008;
    Mnih & Hinton, 2008; Turian et al., 2010; Mikolov et al.,
    2013a;c). In their formulation, each word is represented by
    a vector which is concatenated or averaged with other word
    vectors in a context, and the resulting vector is used to pre-
    dict other words in the context. For example, the neural
    network language model proposed in (Bengio et al., 2006)
    uses the concatenation of several previous word vectors to
    form the input of a neural network, and tries to predict the
    next word. The outcome is that after the model is trained,
    the word vectors are mapped into a vector space such that
    Distributed Representations of Sentences and Documents
    semantically similar words have similar vector representa-
    tions (e.g., “strong” is close to “powerful”).
    Following these successful techniques, researchers have
    tried to extend the models to go beyond word level
    to achieve phrase-level or sentence-level representa-
    tions (Mitchell & Lapata, 2010; Zanzotto et al., 2010;
    Yessenalina & Cardie, 2011; Grefenstette et al., 2013;
    Mikolov et al., 2013c). For instance, a simple approach is
    using a weighted average of all the words in the document.
    A more sophisticated approach is combining the word vec-
    tors in an order given by a parse tree of a sentence, using
    matrix-vector operations (Socher et al., 2011b). Both ap-
    proaches have weaknesses. The first approach, weighted
    averaging of word vectors, loses the word order in the same
    way as the standard bag-of-words models do. The second
    approach, using a parse tree to combine word vectors, has
    been shown to work for only sentences because it relies on
    parsing.

     

    *** 次数:10600 已用完,请联系开发者***
    1. 本站所有资源来源于用户上传和网络,因此不包含技术服务请大家谅解!如有侵权请邮件联系客服!384324621@qq.com
    2. 本站不保证所提供下载的资源的准确性、安全性和完整性,资源仅供下载学习之用!如有链接无法下载、失效或广告,请联系客服处理,有奖励!
    3. 您必须在下载后的24个小时之内,从您的电脑中彻底删除上述内容资源!如用于商业或者非法用途,与本站无关,一切后果请用户自负!
    4. 如果您也有好的资源或教程,您可以投稿发布,成功分享后有★币奖励和额外收入!

    IO 源码网 » Distributed Representations of Sentences and Documents PDF 下载

    常见问题FAQ

    免费下载或者VIP会员专享资源能否直接商用?
    本站所有资源版权均属于原作者所有,这里所提供资源均只能用于参考学习用,请勿直接商用。若由于商用引起版权纠纷,一切责任均由使用者承担。更多说明请参考 VIP介绍。
    提示下载完但解压或打开不了?
    最常见的情况是下载不完整: 可对比下载完压缩包的与网盘上的容量,若小于网盘提示的容量则是这个原因。这是浏览器下载的bug,建议用百度网盘软件或迅雷下载。若排除这种情况,可在对应资源底部留言,或 联络我们.。
    找不到素材资源介绍文章里的示例图片?
    对于PPT,KEY,Mockups,APP,网页模版等类型的素材,文章内用于介绍的图片通常并不包含在对应可供下载素材包内。这些相关商业图片需另外购买,且本站不负责(也没有办法)找到出处。 同样地一些字体文件也是这种情况,但部分素材会在素材包内有一份字体下载链接清单。
    IO源码吧
    一个高级程序员模板开发平台

    发表评论

    • 177会员总数(位)
    • 12330资源总数(个)
    • 53本周发布(个)
    • 0 今日发布(个)
    • 563稳定运行(天)

    提供最优质的资源集合

    立即查看 了解详情