Aller au contenu principal

Designing your first custom eML

The custom eML design require basic information like the data format and usage. So the eML needs to be associated with a Dataset which depends on the application. Then, the ML algorithm needs to be trained and converted. Finally, the eML can be executed on the embedded device.

ML algorithm choice

There is multiple ML algorithm existing, however, 6TRON only tested a few:

  • LGBM Classifier
  • LGBM Regressor
  • Deep Neural Network

However each Python API library needs to be installed in the Python environement used.

Python install

## LGBM Regressor & Classifier
> pip install lightgbm
## Deep Nerual Network
> pip install tensorflow
## Other ML algorithm
> pip install scikit-learn

Example code below are some example of various ML algorithm for anomalies detection and states classification.

Python example code

## LGBM Classifier
import lightgbm as lgbm

LGBMC_model = lgbm.LGBMRegressor(num_leaves=4, max_depth=2)
## LGBM Regressor
import lgbm

LGBMR_model = lgbm.LGBMClassifier(num_leaves=4, max_depth=2)
## Neural Network
# pip install tensorflow
import tensorflow as tf

DNN_model = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape=input_shape, activation='relu'),
tf.keras.layers.Dense(10, input_shape=input_shape, activation='relu'),
tf.keras.layers.Dense(output_shape, activation='softmax')
])

Regressor and Classifier algorithm from lightGBM got the same parameter used to reduce the memory size used. num_leaves set the maximum number of leaves for the base learners and max_depth correspond to the maximum tree depth for base learners. For more information follow these links Regressor, Classifer. Deep Neural Network depends on the tensorflow API. The network is created by adding layers to classify data into predifined classes. More information on the different layer and there usage can be found here Tensorflow. For more information on other ML algorithm, follow this link.

Dataset selection

First things to take into account for the dataset selection are the ml algorithm characteristics. Indeed the dataset might need to be bigger if the considered ml algorithm is a deep learning algorithm.

Note: Dataset will grow along with some parameters:

  • The number of input for the ml algorithm.
  • The complexity of the ml algorithm.
  • Uncontrolled environment leads to more data needed.

Existing Dataset

Pre-existing dataset can be used. In this case, any pre-processing (data modification) before ml algorithm training needs to be identified:

Python pseudo-code

def pre_process(dataset):
dataset = modification_1(dataset)
dataset = modification_2(dataset)

return dataset


training_dataset = pre_process(dataset)

As shown above any data modification in the function pre_process (modification_1 and modification_2) have a substencial impact on the implementation step. Indeed, on board acquired data need to meet the characteristics of the training dataset. Consequently, modifying functions may need to be implemented too.

Own Dataset

In the other hand, if no dataset exist for the application, it might be necessary to generate a new dataset. Generating the dataset brings benefits:

  • Acquisition step is already implemented as the dataset generation is executed on hardware.
  • Data processing is fully controlled allowing consistency between training and inference.
  • It allow dataset customisation with data precision, data span, etc.
  • It also allow environement control to reduce the dataset size.

⚠️ Controlling the environement might biases the ML algorithm on slight environement modification.

📝 Pre-processing function might be computed by the ML algorithm. However, this leads to a higher complexity and larger ML algorithm.

ML algorithm training

Each ML algorithm has its own API for training. Once ML algorithm chosen, the API used is determined and provides the API for ML algorithm training. However there are some variation to take into account.

First dataset is extracted from file, the dataset is contained in a single file in the example below. Then the dataset is split into training and test sets for training and validation.

Python data extraction from dataset

import pandas as pd

# Load the data into a Pandas DataFrame
data_frame = pd.read_csv("dataset_file.csv")
data_frame.columns = ["var1", "var2", "var3", "var4", "target"]
data_frame = df.sort_values('target')

# Split the data into training and test sets
X = data_frame.drop(columns=["target"])
y = data_frame["target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Following is the specific process of the dataset depending on the ML algorithm selected.

LGBM Cassifier

## TRAIN
# Train the classifier using the training data
LGBMC_model.fit(X_train, y_train)

## TEST
# Test the classifier using the test data
y_pred = LGBMC_model.predict(X_test)

# Evaluate the performance of the classifier
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

The training of the lightGBM Classifier is simple and only take X_train and y_train in the fit function. However, the Regressor training from lightGBM is more complex. As shown below.

LGBM Regressor

## DATASET
# Chosing "var2" to be the predicted input
y = df["var2"]
# Remove "var2" and "target" from the input of the algorithm
X = df.drop(columns=["var2"])
X = X.drop(columns=["target"])

# Only take a small amount of data to train
X_train=X[0:60000]
y_train=y[0:60000]

## TRAIN
# Train the model
LGBMR_model.fit(X_train, y_train)

The principle of the Regressor of lightGBM is that the ML algorithm try to predict one of the system input and then, the correlation between the real input and the prediction determines if there is an anomaly. To train such algorithm, the dataset needs to be constructed differently as shown above. The testing part for this algorithm is more complex so it is not covered, however, the jupiter in wich the test is described can be found here.

Neural Network

# Compile the model
DNN_model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Define the input data and labels
from keras.utils import to_categorical
one_hot = to_categorical(y_train, 5)

# Fit the model on the input data and labels
history = model.fit( X_train, one_hot, batch_size=128, epochs=20)

Deep Neural Network train is the basic training available in the tensorflow documentation more information here.

eML conversion

eML consersion from ML algorithm depends on the API of the algorithm. 6TRON is using m2cgen Python library for pure machine learning algorithm and tensorflow API for deep learning algorithm. As shown below, m2cgen provide multiple language output. However, 6TRON uses C-language code so the export_to_c function is used.

M2CGEN Python example

## Import library
import m2cgen as m2c

# Export model
with open('LGBM_file_name.c', 'w') as f:
f.write(m2c.export_to_c(LGBM_model, function_name = "LGBM_function_name"))
# Reduce implementation size by transforming double into float
!sed -i 's/double/float/g' {"LGBM_file_name.c"}

📝 Double variables consumes 8 bytes of memory for storage instead of 4 bytes for Float variables. This allow the resulting m2cgen C-file to consume twice as less memory.

Tensorflow Python example

# Convert the model to the TensorFlow Lite format without quantization
converter = tf.lite.TFLiteConverter.from_keras_model(DNN_model)
model_no_quant_tflite = converter.convert()

# Convert the model to the TensorFlow Lite format with quantization
def representative_dataset():
for i in range(500):
yield([X_train[i].astype('float32')])

# Set the optimization flag.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Enforce integer only quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# Provide a representative dataset to ensure we quantize correctly.
converter.representative_dataset = representative_dataset

model_tflite = converter.convert()

# Save the model to disk
open("models/model.tflite", "wb").write(model_tflite)

# Install xxd if it is not available
# !apt-get update && apt-get -qq install xxd
# Convert to a C source file, i.e, a TensorFlow Lite for Microcontrollers model
!xxd -i {"models/model.tflite"} > {"models/neural_network.c"}
# Update variable names
REPLACE_TEXT = "models/nn.tflite".replace('/', '_').replace('.', '_')
!sed -i 's/'{REPLACE_TEXT}'/g_model/g' {"models/neural_network.c"}

eML execution

Once ML algorithm is converted to eML, acquisition process and data pre-processing need to be implemented. As presented below, the example code for data generation on zestsensor_4-20mA can be used making sure the the pre-processing function correspond to the pre-processing identified in _Dataset selection section.

Embedded acquisition C-code example

## Using Zest_sensor_4-20mA from 6TRON

# spi, cs, reset, data_rdy needs to be defined for the board used
ADS131A04 sensor_4_20ma(spi, cs, reset, data_rdy);
EventQueue queue;
Thread thread;

void process_function(adc_data_struct data)
{
# Write any process to be executed on data
data = pre_process_function(data);
# Then save data on sd card or send it on serial link to be saved.
send_save_function(data);
}

void acquisition(void)
{
adc_data_struct data;
sensor_4_20ma.read_adc_data(&data);

process_function(data);
}

void interrupt(void)
{
queue.call(acquisition);
}

int main()
{
sensor_4_20ma.attach_callback(interrupt);
thread.start(callback(&queue, &EventQueue::dispatch_forever));
}

In order to use the eML inference, the above process_function() can be modified to transfer acquired data to the eML.

eML inference C-code example

void eml_inference(adc_data_struct data)
{
# re-organize data corresponding to the eML used
eml_buffer = organize_data(data);
# execute eml inference
score = eml_score(eml_buffer);
}

void process_function(adc_data_struct data)
{
# Write any process to be executed on data
data = pre_process_function(data);
# Then transfer data to eML
eml_inference(data);
}