Skip to main content

Getting started with Layer

Open in Layer Open in Colab Layer Examples Github

In this quick walkthrough, we will train a machine learning model to predict the survivors of the Titanic disaster and deploy it for real-time inference using Layer.

Installation

pip install scikit-learn numpy pandas
!pip install -U layer -q
from layer.decorators import dataset, model,resources
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np
import layer
from layer import Dataset

Register and Login

To start using Layer, you have to register and login. Run the following cell, click the link, register and paste the code in the input

layer.login()

Inititialize Your First Layer Project

It's time to create your first Layer Project. You can find your created project at https://app.layer.ai

layer.init("titanic")

Build Passengers Dataset

Let's start building our data to train our model. We will be using the Kaggle Titanic Dataset which consists two datasets:

  • train.csv
  • test.csv

Let's clone the Layer Titanic Project repo which has these datasets.

!git clone https://github.com/layerai/examples
!mv ./examples/titanic/* ./

Ok, we have our data ready in the ./data folder. Now we are going to be merging these datasets and transform it so that we can feed it to our model.

def clean_gender(sex):
result = 0
if sex == "female":
result = 0
elif sex == "male":
result = 1
return result


def clean_age(data):
age = data[0]
pclass = data[1]
if pd.isnull(age):
if pclass == 1:
return 37
elif pclass == 2:
return 29
else:
return 24
else:
return age

@dataset("passengers")
@resources(path="./data")
def build_passengers_dataset():
train_df = pd.read_csv("data/train.csv")
test_df = pd.read_csv("data/test.csv")
df = train_df.append(test_df)

df['Sex'] = df['Sex'].apply(clean_gender)
df['Age'] = df[['Age', 'Pclass']].apply(clean_age, axis=1)
df = df.drop(["PassengerId", "Name", "Cabin", "Ticket", "Embarked"], axis=1)

return df
# You can run this function in your local for debugging. 
# build_passengers_dataset()

# When ready, you can pass the function to Layer to build your dataset
layer.run([build_passengers_dataset])

Train Survival Model

We will be training a RandomForestClassifier to predict the survivors. As you can see the following function is a simple training function. We just added the @model decorator to integrate Layer into our pipeline

@model(name='survival_model',dependencies=[Dataset('passengers')])
def train():
parameters = {
"test_size": 0.25,
"random_state": 42,
"n_estimators": 100
}

# You can log parameters to compare your experiments later
layer.log(parameters)

# You can now load the `Passengers` training dataset from Layer
df = layer.get_dataset("passengers").to_pandas()

df.dropna(inplace=True)
X = df.drop(["Survived"], axis=1)
y = df["Survived"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=parameters["test_size"], random_state=parameters["random_state"])

random_forest = RandomForestClassifier(n_estimators=parameters["n_estimators"])
random_forest.fit(X_train, y_train)

# Here we log the accuracy
y_pred = random_forest.predict(X_test)
layer.log({"accuracy":accuracy_score(y_test, y_pred)})

# You can just return the model and Layer will semantically version
# your model and store it under your project
return random_forest
# You can train your model in your local. Layer will log the parameters and metrics and
# upload the trained model to help you track your experiments, even the local ones
# train()

# When ready, just pass your training function to Layer
layer.run([train])

You can run the entire pipeline by passing all your functions to layer.run. To run the pipeline in an optimized manner, we include the passengers dataset as dependency to the model.

layer.run([build_passengers_dataset,train])

NOTE: Be aware of passing a state from outside your function to the function itself. While it's technically possible, caution should be taken in two main cases:

1) Passing database (or other external) connections. Such connections might not work as expected as their state can't be serialized when submitted for remote execution.

2) Passing big objects, for example datasets or models, can be prohibitively expensive as those objects will be sent over the network when submitted for remote execution. It's recommended to build/fetch such objects within the function itself.

Results

After you train your model, you can see all your datasets and model experiments here in the Layer interface

https://app.layer.ai/

Or you can re-use one of the entities you have created.

Fetching an ML model from Layer

After you build the Layer Project, you can fetch your trained models from Layer with simple calling layer.get_model("MODEL_NAME")

If you make your project public, anyone even without a Layer account can fetch your models for their own use.

survival_model = layer.get_model("survival_model").get_train()
survival_model

Fetching a dataset from Layer

Just like the models, you can fetch a dataset that you have built under a Layer Project. Anyone can fetch and use your dataset if your project is public

df = layer.get_dataset("passengers").to_pandas()
df.head()

Now, let's predict the survival probability of a passenger with the model and dataset we have trained and registered to Layer.

passenger = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']]
survival_probability = survival_model.predict_proba(passenger.sample())[0][1]
print(f"Survival Probability: {survival_probability:.2%}")

Where to go from here?

Now that you have created first Layer Project, you can: