最新公告
  • 欢迎您光临IO源码网,本站秉承服务宗旨 履行“站长”责任,销售只是起点 服务永无止境!立即加入我们
  • MapReduce数据密集型文本处理 PDF 下载

    MapReduce数据密集型文本处理 PDF 下载

    本站整理下载:
    链接:https://pan.baidu.com/s/1JEozv2POtghefF8pLJ-QRw 
    提取码:wzrx 
     
     
    相关截图:
     
    主要内容:
    4 CHAPTER 1. INTRODUCTION
    everything from understanding political discourse in the blogosphere to predicting the
    movement of stock prices.
    There is a growing body of evidence, at least in text processing, that of the three
    components discussed above (data, features, algorithms), data probably matters the
    most. Superficial word-level features coupled with simple models in most cases trump
    sophisticated models over deeper features and less data. But why can’t we have our cake
    and eat it too? Why not both sophisticated models and deep features applied to lots of
    data? Because inference over sophisticated models and extraction of deep features are
    often computationally intensive, they don’t scale well.
    Consider a simple task such as determining the correct usage of easily confusable
    words such as “than” and “then” in English. One can view this as a supervised machine
    learning problem: we can train a classifier to disambiguate between the options, and
    then apply the classifier to new instances of the problem (say, as part of a grammar
    checker). Training data is fairly easy to come by—we can just gather a large corpus of
    texts and assume that most writers make correct choices (the training data may be noisy,
    since people make mistakes, but no matter). In 2001, Banko and Brill [14] published
    what has become a classic paper in natural language processing exploring the e↵ects
    of training data size on classification accuracy, using this task as the specific example.
    They explored several classification algorithms (the exact ones aren’t important, as we
    shall see), and not surprisingly, found that more data led to better accuracy. Across
    many di↵erent algorithms, the increase in accuracy was approximately linear in the
    log of the size of the training data. Furthermore, with increasing amounts of training
    data, the accuracy of di↵erent algorithms converged, such that pronounced di↵erences
    in e↵ectiveness observed on smaller datasets basically disappeared at scale. This led to
    a somewhat controversial conclusion (at least at the time): machine learning algorithms
    really don’t matter, all that matters is the amount of data you have. This resulted in
    an even more controversial recommendation, delivered somewhat tongue-in-cheek: we
    should just give up working on algorithms and simply spend our time gathering data
    (while waiting for computers to become faster so we can process the data).
    As another example, consider the problem of answering short, fact-based questions
    such as “Who shot Abraham Lincoln?” Instead of returning a list of documents that the
    user would then have to sort through, a question answering (QA) system would directly
    return the answer: John Wilkes Booth. This problem gained interest in the late 1990s,
    when natural language processing researchers approached the challenge with sophisticated linguistic processing techniques such as syntactic and semantic analysis. Around
    2001, researchers discovered a far simpler approach to answering such questions based
    on pattern matching [27, 53, 92]. Suppose you wanted the answer to the above question.
    As it turns out, you can simply search for the phrase “shot Abraham Lincoln” on the
    web and look for what appears to its left. Or better yet, look through multiple instances
    5
    of this phrase and tally up the words that appear to the left. This simple strategy works
    surprisingly well, and has become known as the redundancy-based approach to question
    answering. It capitalizes on the insight that in a very large text collection (i.e., the
    web), answers to commonly-asked questions will be stated in obvious ways, such that
    pattern-matching techniques suffice to extract answers accurately.

     

    *** 次数:10600 已用完,请联系开发者***

    1. 本站所有资源来源于用户上传和网络,因此不包含技术服务请大家谅解!如有侵权请邮件联系客服!384324621@qq.com
    2. 本站不保证所提供下载的资源的准确性、安全性和完整性,资源仅供下载学习之用!如有链接无法下载、失效或广告,请联系客服处理,有奖励!
    3. 您必须在下载后的24个小时之内,从您的电脑中彻底删除上述内容资源!如用于商业或者非法用途,与本站无关,一切后果请用户自负!
    4. 如果您也有好的资源或教程,您可以投稿发布,成功分享后有★币奖励和额外收入!

    IO 源码网 » MapReduce数据密集型文本处理 PDF 下载

    常见问题FAQ

    免费下载或者VIP会员专享资源能否直接商用?
    本站所有资源版权均属于原作者所有,这里所提供资源均只能用于参考学习用,请勿直接商用。若由于商用引起版权纠纷,一切责任均由使用者承担。更多说明请参考 VIP介绍。
    提示下载完但解压或打开不了?
    最常见的情况是下载不完整: 可对比下载完压缩包的与网盘上的容量,若小于网盘提示的容量则是这个原因。这是浏览器下载的bug,建议用百度网盘软件或迅雷下载。若排除这种情况,可在对应资源底部留言,或 联络我们.。
    找不到素材资源介绍文章里的示例图片?
    对于PPT,KEY,Mockups,APP,网页模版等类型的素材,文章内用于介绍的图片通常并不包含在对应可供下载素材包内。这些相关商业图片需另外购买,且本站不负责(也没有办法)找到出处。 同样地一些字体文件也是这种情况,但部分素材会在素材包内有一份字体下载链接清单。
    IO源码吧
    一个高级程序员模板开发平台

    发表评论

    • 121会员总数(位)
    • 11507资源总数(个)
    • 79本周发布(个)
    • 4 今日发布(个)
    • 483稳定运行(天)

    提供最优质的资源集合

    立即查看 了解详情