Machine Learning with Scala Logistic Regression
Logistic regression with Scala
In the post Machine Learning with Scala Linear Regression we saw how to develop a simple linear regressor with the aide of the Breeze library. In this post, we see how to develop a logistic regressor classifier for two class classification.
Logistic regression is a linear classifier that is the decision boundary is a line or a hyperplane. The logistic regression algorithm is to a large extent similar to linear regression with two notable differences
- We filter the result of the linear regression so that it is mapped in the range $[0, 1]$. Thus, the immediate output of logistic regression can be interpreted as a probability
- The loss function that we minimize is not the MSE
Other than that the algorithm is the same. Hence, we use a linear model of the form
$$\hat{y}_i = a x_i + b$$
and we filter it via function so that the ouput is mapped bewteen $[0, 1]$. The sigmoid function
$$\phi(x) = \frac{1}{1 + e^{-x}}$$
can be used for such a filtering.
The loss function has the following form
$$L(\mathbf{w}) = \sum_{i}^N -y_i log(\hat{y}_i) + (1 - y_i)(1 - log(\hat{y}_i))$$
where $\mathbf{w}$ is the parameters coefficients with $\mathbf{w} = [a, b]$.
We first import some useful packages
import breeze.linalg.{DenseMatrix, DenseVector}
import breeze.linalg._
import breeze.numerics.{exp, log1p, sigmoid}
import breeze.optimize.{DiffFunction, minimize}
We wrap the loss function and its gradient calculation into an object
class
object LogisticRegression{
def L(x: DenseMatrix[Double], y: DenseVector[Double],
parameters: DenseVector[Double]): Double = {
val xBeta = x * parameters
val expXBeta = exp(xBeta)
val targets_time = y *:* xBeta
-sum(targets_time - log1p(expXBeta))
}
def gradL(x: DenseMatrix[Double], y: DenseVector[Double],
parameters: DenseVector[Double]): DenseVector[Double]={
val xBeta = x * parameters
val probs = sigmoid(xBeta)
x.t * (probs - y)
}
}
This is the class that wraps the linear regression model.
class LogisticRegression {
// The model parameters
var parameters: DenseVector[Double] = null
// Flag indicating if the interception term is used
var useIntecept: Boolean=true;
// auxiliary constructor
def this(numFeatures: Int, useIntercept: Boolean=true){
this()
init(numFeatures = numFeatures, useIntercept = useIntercept)
}
// initialize the underlying data
def init(numFeatures: Int, useIntercept: Boolean=true): Unit = {
val totalFeatures = if(useIntercept) numFeatures + 1 else numFeatures
this.parameters = DenseVector.zeros[Double](totalFeatures)
this.useIntecept = useIntercept
}
// train the model
def train(x: DenseMatrix[Double], y: DenseVector[Double])={
// set up the optimization
val f = new DiffFunction[DenseVector[Double]] {
def calculate(parameters: DenseVector[Double]) = (LogisticRegression.L(x, y, parameters=parameters),
LogisticRegression.gradL(x, y, parameters = parameters))
}
this.parameters = minimize(f, this.parameters)
}
// predict the class of the given point
def predict(x: DenseVector[Double]): Double = {
require(parameters != null)
if(!useIntecept){
require(x.size == parameters.size)
sum(parameters * x)
}
else{
require(x.size == parameters.size -1 )
sum(parameters.slice(0, x.size) * x) + parameters(0)
}
}
}
Let's put this into action with a simple example.
import breeze.linalg._
import breeze.numerics._
import breeze.optimize._
import breeze.stats._
import engine.models.LogisticRegression
import engine.utils.{CSVDataSetLoader, VectorUtils}
import spire.algebra.NormedVectorSpace.InnerProductSpaceIsNormedVectorSpace
import spire.implicits.rightModuleOps
object LogisticRegression_Exe extends App{
println(s"Starting application: ${LogisticRegression_Exe.getClass.getName}")
// load the data
val data = CSVDataSetLoader.loadRepHeightWeightsFullData
val recaledHeights = VectorUtils.standardize(data.heights);
val rescaledWeights = VectorUtils.standardize(data.weights);
val rescaledHeightsAsMatrix = recaledHeights.toDenseMatrix.t
val rescaledWeightsAsMatrix = rescaledWeights.toDenseMatrix.t
val featureMatrix = DenseMatrix.horzcat(DenseMatrix.ones[Double](rescaledHeightsAsMatrix.rows, 1),
rescaledHeightsAsMatrix, rescaledWeightsAsMatrix)
println(s"Feature matrix shape (${featureMatrix.rows}, ${featureMatrix.cols})")
val targets = data.genders.values.map{gender => if(gender == 'M') 1.0 else 0.0}
println(s"Targets vector shape (${targets.size}, )")
// logistic regression model
val lr = new LogisticRegression;
// initialize the model
lr.init(numFeatures=2)
lr.train(x=featureMatrix, y=targets)
val optimalParams = lr.parameters
println(s"Optimal parameters ${optimalParams}")
println("Done...")
}
You can find the complete example in this repo.
In this post we looked into how to develop a simple linear regression model with Scala. The Scala numerics library Breeze greatly simplifies the development.
- Logistic regression
- Pascal Bugnion, Patric R. Nicolas, Alex Kozlov,
Scala: Applied Machine Learning