最新公告
  • 欢迎您光临IO源码网,本站秉承服务宗旨 履行“站长”责任,销售只是起点 服务永无止境!立即加入我们
  • 深度学习 智能时代的核心驱动力量 PDF 下载

    《深度学习理论》笔记 PDF 下载

     
    本站整理下载:
    链接:https://pan.baidu.com/s/14cJovnuosJouDt44Tncj8Q 
    提取码:xge9 
     
     
    相关截图:
     
    主要内容:
    Introduction
    Machine learning aims to solve the following problem:
    R(f) → min
    f∈F
    . (1.1)
    Here R(f) = Ex,y∼Dr(y, f(x)) is a true risk of a model f from a class F, and D is a data distribution. However,
    we do not have an access to the true data distribution; instead we have a finite set of i.i.d. samples from it:
    Sn = {(xi
    , yi)}n
    minimization:
    i=1 ∼ Dn. For this reason, instead of approaching (1.1), we substitite it with an empirical risk
    ˆRn(f) → min
    f∈F
    , (1.2)
    where ˆRn(f) = Ex,y∈Sn r(y, f(x)) is an empirical risk of a model f from a class F.
    1.1 Generalization ability
    How does the solution of (4.2) relate to (1.1)? In other words, we aim to upper-bound the difference between the
    two risks:
    R(fˆn) ˆRn(fˆn) ≤ bound(fˆn, F, n, δ) w.p. ≥ 1 δ over Sn, (1.3)
    where fˆn ∈ F is a result of training the model on the dataset Sn.
    We call the bound (1.3) a-posteriori if it depends on the resulting model fˆn, and we call it a-priori if it does not.
    An a-priori bound allows one to estimate the risk difference before training, while an a-posteriori bound estimates
    the risk difference based on the final model.
    Uniform bounds are instances of an a-priori class:
    R(fˆn) ˆRn(fˆn) ≤ sup
    f∈F
    |R(f) ˆRn(f)| ≤ ubound(F, n, δ) w.p. ≥ 1 δ over Sn, (1.4)
    A typical form of the uniform bound is the following:
    ubound(F, n, δ) = O rC(F) + log(1/δ) n ! , (1.5)
    where C(F) is a complexity of the class F.
    The bound above suggests that the generalization ability, measured by the risk difference, decays as the model
    class becomes larger. This suggestion conforms the classical bias-variance trade-off curve. The curve can be reproduced if we fit the Runge function with a polynomial using a train set of equidistant points; the same phenomena
    can be observed for decision trees.
    A typical notion of model class complexity is VC-dimension [Vapnik and Chervonenkis, 1971]. For neural networks, VC-dimension grows at least linearly with the number of parameters [Bartlett et al., 2019]. Hence the
    bound (1.5) becomes vacuous for large enough nets. However, as we observe, the empirical (train) risk ˆRn vanishes,
    while the true (test) risk saturates for large enough width (see Figure 1 of [Neyshabur et al., 2015]).
    3
    One might hypothesize that the problem is in VC-dimension, which overestimates the complexity of neural nets.
    However, the problem turns out to be in uniform bounds in general. Indeed, if the class F contains a bad network,
    i.e. a network that perfectly fits the train data but fails desperately on the true data distribution, the uniform
    bound (1.4) becomes at least nearly vacuous. In realistic scenarios, such a bad network can be found explicitly:
    [Zhang et al., 2016] demonstrated that practically large nets can fit data with random labels; similarly, these nets
    can fit the training data plus some additional data with random labels. Such nets fit the training data perfectly
    but generalize poorly.
    Up to this point, we know that among the networks with zero training risk, some nets generalize well, while
    some generalize poorly. Suppose we managed to come with some model complexity measure that is symptomatic
    for poor generalization: bad nets have higher complexity than good ones. If we did, we can come up with a better
    bound by prioritizing less complex models.
    Such prioritization is naturally supported by a PAC-bayesian paradigm. First, we come up with a prior distribution P over models. This distribution should not depend on the train dataset Sn. Then we build a posterior
    distribution Q | Sn over models based on observed data. For instance, if we fix random seeds, a usual network
    training procedure gives a posterior distribution concentrated in a single model fˆn. The PAC-bayesian bound
    [McAllester, 1999b] takes the following form:
    R(Q | Sn) ˆRn(Q | Sn) ≤ O r
    KL(Q | SnkP) + log(1/δ) n !
    w.p. ≥ 1 δ over Sn, (1.6)
    where R(Q) is an expected risk for models sampled from Q; similarly for ˆRn(Q). If more complex models are less
    likely to be found, then we can embed this information into prior, thus making the KL-divergence typically smaller.
    The PAC-bayesian bound (1.6) is an example of an a-posteriori bound, since the bound depends on Q. However,
    it is possible to obtain an a-priori bound using the same paradigm [Neyshabur et al., 2018].
    The bound (1.6) becomes better when our training procedure tends to find models that are probable according
    to the prior. But what kind of models does the gradient descent typically find? Does it implicitly minimize some
    complexity measure of the resulting model? Despite the existence of bad networks, minimizing the train loss using
    a gradient descent typically reveals well-performing solutions. This phenomenon is referred as an implicit bias of
    gradient descent.
    Another problem with a-priori bounds is that they all are effectively two-sided: all of them are bounding an
    absolute value of the risk difference, rather then the risk difference itself. Two-sided bounds fail if there exist
    networks that generalize well, while failing on a given train set. [Nagarajan and Kolter, 2019] have constructed a
    problem for which such networks are typically found by gradient descent.
    1.2 Global convergence
    We have introduced the empirical minimization problem (4.2) because we were not able to minimize the true risk
    directly: see (1.1). But are we able to minimize the empirical risk? Let f(x; θ) be a neural net evaluated at input
    x with parameters θ. Consider a loss function ℓ that is a convex surrogate of a risk r. Then minimizing the train
    loss will imply empirical risk minimization:
    Lˆn(θ) = Ex,y∈Sn ℓ(y, f(x; θ)) → min
    θ . (1.7)
    Neural nets are complex non-linear functions of both inputs and weights; we can hardly expect the loss landscape
    ˆLn(θ) induced by such functions to be simple. At least, for non-trivial neural nets Lˆn is a non-convex function of
    θ. Hence it can have local minima that are not global.
    The most widely-used method of solving the problem (1.7) for deep learning is gradient descent (GD), or some of
    its variants. Since GD is a local method, it cannot have any global convergence guarantees in general case. However,
    for practically-sized neural nets it always succeeds in finding a global minimum.
    Given this observation, it is tempting to hypothesize that despite of the non-convexity, all local minima of ˆLn(θ)
    are global. This turns to be true for linear nets [Kawaguchi, 2016, Lu and Kawaguchi, 2017, Laurent and Brecht, 2018],
    and for non-linear nets if they are sufficiently wide [Nguyen, 2019].
    While globality of local minima implies almost sure convergence of gradient descent [Lee et al., 2016, Panageas and Piliouras, 2017],
    there are no guarantees on convergence speed. Generally, convergence speed depends on initialization. For instance,

     

    *** 次数:10600 已用完,请联系开发者***

    1. 本站所有资源来源于用户上传和网络,因此不包含技术服务请大家谅解!如有侵权请邮件联系客服!384324621@qq.com
    2. 本站不保证所提供下载的资源的准确性、安全性和完整性,资源仅供下载学习之用!如有链接无法下载、失效或广告,请联系客服处理,有奖励!
    3. 您必须在下载后的24个小时之内,从您的电脑中彻底删除上述内容资源!如用于商业或者非法用途,与本站无关,一切后果请用户自负!
    4. 如果您也有好的资源或教程,您可以投稿发布,成功分享后有★币奖励和额外收入!

    IO 源码网 » 深度学习 智能时代的核心驱动力量 PDF 下载

    常见问题FAQ

    免费下载或者VIP会员专享资源能否直接商用?
    本站所有资源版权均属于原作者所有,这里所提供资源均只能用于参考学习用,请勿直接商用。若由于商用引起版权纠纷,一切责任均由使用者承担。更多说明请参考 VIP介绍。
    提示下载完但解压或打开不了?
    最常见的情况是下载不完整: 可对比下载完压缩包的与网盘上的容量,若小于网盘提示的容量则是这个原因。这是浏览器下载的bug,建议用百度网盘软件或迅雷下载。若排除这种情况,可在对应资源底部留言,或 联络我们.。
    找不到素材资源介绍文章里的示例图片?
    对于PPT,KEY,Mockups,APP,网页模版等类型的素材,文章内用于介绍的图片通常并不包含在对应可供下载素材包内。这些相关商业图片需另外购买,且本站不负责(也没有办法)找到出处。 同样地一些字体文件也是这种情况,但部分素材会在素材包内有一份字体下载链接清单。
    IO源码吧
    一个高级程序员模板开发平台

    发表评论

    • 310会员总数(位)
    • 8774资源总数(个)
    • 580本周发布(个)
    • 0 今日发布(个)
    • 337稳定运行(天)

    提供最优质的资源集合

    立即查看 了解详情