Learning: Overview and Supervised Learning in VS .NET Draw PDF-417 2d barcode in VS .NET Learning: Overview and Supervised Learning

7. Learning: Overview and Supervised Learning Using Barcode drawer for VS .NET Control to generate, create PDF-417 2d barcode image in .NET framework applications. qrcode 7.3.2 Linear Regression and Classi cation Linear functions provide the basis for many learning algorithms. In this section, we rst cover regression the problem of predicting a real-valued function from training examples. Then we consider the discrete case of classi cation.

Linear regression is the problem of tting a linear function to a set of input output pairs given a set of training examples, in which the input and output features are numeric. Suppose the input features are X1 , . .

. , Xn . A linear function of these features is a function of the form f w (X1 , .

. . , Xn ) = w0 + w1 X1 + + wn Xn , where w = w0 , w1 , .

. . , wn is a tuple of weights.

To make w0 not be a special case, we invent a new feature, X0 , whose value is always 1. We will learn a function for each target feature independently, so we consider only one target, Y. Suppose a set E of examples exists, where each example e E has values val(e, Xi ) for feature Xi and has an observed value val(e, Y).

The predicted value is thus pvalw (e, Y) = w0 + w1 val(e, X1 ) + + wn val(e, Xn ). wi val(e, Xi ) ,. where we have ma de it explicit that the prediction depends on the weights, and where val(e, X0 ) is de ned to be 1. The sum-of-squares error on examples E for target Y is ErrorE (w) =. (val(e, Y) p .NET framework PDF417 valw (e, Y))2 . val(e, Y) wi val(e, Xi ). i=0 n 2 (7.1). In this linear c PDF-417 2d barcode for .NET framework ase, the weights that minimize the error can be computed analytically [see Exercise 7.5 (page 344)].

A more general approach, which can be used for wider classes of functions, is to compute the weights iteratively. Gradient descent (page 149) is an iterative method to nd the minimum of a function. Gradient descent starts with an initial set of weights; in each step, it decreases each weight in proportion to its partial derivative: wi := wi ErrorE (w) wi.

where , the gra dient descent step size, is called the learning rate. The learning rate, as well as the features and the data, is given as input to the learning algorithm. The partial derivative speci es how much a small change in the weight would change the error.

. 7.3. Basic Models for Supervised Learning 1: 2: 3: 4: 5: 6 : 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:. procedure Linear VS .NET PDF417 Learner(X, Y, E, ) Inputs X: set of input features, X = {X1 , . .

. , Xn } Y: target feature E: set of examples from which to learn : learning rate Output parameters w0 , . .

. , wn Local w0 , . .

. , wn : real numbers pvalw (e, Y) = w0 + w1 val(e, X1 ) + + wn val(e, Xn ) initialize w0 , . .

. , wn randomly repeat for each example e in E do := val(e, Y) pvalw (e, Y) for each i [0, n] do wi := wi + val(e, Xi ) until termination return w0 , . .

. , wn Figure 7.6: Gradient descent for learning a linear function.

Consider minimiz ing the sum-of-squares error. The error is a sum over the examples. The partial derivative of a sum is the sum of the partial derivatives.

Thus, we can consider each example separately and consider how much it changes the weights. The error with respect to example e has a partial derivative with respect to weight of wi of 2 [val(e, Y) pvalw (e, Y)] val(e, Xi ). For each example e, let = val(e, Y) pvalw (e, Y).

Thus, each example e updates each weight wi : wi := wi + val(e, Xi ), (7.2). where we have ig .NET PDF 417 nored the constant 2, because we assume it is absorbed into the constant . Figure 7.

6 gives an algorithm, LinearLearner(X, Y, E, ), for learning a linear function for minimizing the sum-of-squares error. Note that, in line 17, val(e, X0 ) is 1 for all e. Termination is usually after some number of steps, when the error is small or when the changes get small.

The algorithm presented in Figure 7.6 is sometimes called incremental gradient descent because of the weight change while it iterates through the examples. An alternative is to save the weights at each iteration of the while loop, use the saved weights for computing the function, and then update these saved weights after all of the examples.

This process computes the true derivative of the error function, but it is more complicated and often does not work as well..
Copyright © 2DBarcode.info . All rights reserved.