"Sigmoid Function," also called the "Logistic Function": h θ ( x )= g ( θ T x ) z = θ T x g ( z )=1/(1+ e − z ) The following image shows us what the sigmoid function looks like: The function g(z), shown here, maps any real number to the (0, 1) interval, making it useful for transforming an arbitrary-valued function into a function better suited for classification. h θ ( x ) will give us the probability that our output is 1. For example, h θ ( x )=0.7 gives us a probability of 70% that our output is 1. Our probability that our prediction is 0 is just the complement of our probability that it is 1 (e.g. if probability that it is 1 is 70%, then the probability that it is 0 is 30%). h θ ( x )= P ( y =1| x ; θ )=1− P ( y =0| x ; θ ) P ( y =0| x ; θ )+ P ( y =1| x ; θ )=1 reference : Andrew Ug, Machine Learning