Data Processing in MATLAB

The MATLAB Online tool allows you to work in the cloud with the latest version of MATLAB available. Thanks to a specific library, which you can find in the link shown at the end of this paragraph, it is possible to work with InfluxDB databases from MATLAB Online.

https://github.com/EnricSala/influxdb-matlab

Below, you will find the code templates that you will need to use to:

Read the data from your InfluxDB database and staging it in the MATLAB workspace.
Carry out data processing that must calculate the average of each of the five parameters received in the last minute of information registered and available in the database.
Write the data, once you have done the processing, back to your InfluxDB database.

It is recommended that you generate a script with four sections, separating them using the characters "%%", as you can see in the following screenshot:

This will allow you to progressively execute each new section that you program, always being able to execute the entire code if you wish, using the Run or Run Section buttons:

You can access MATLAB Online by clicking on the following link your Mathworks account credentials.

http://matlab.mathworks.com

You will find all the information regarding how to access and/or create a Mathworks account in the additional documentation that has been provided to you.

Simple data analitycs

First Section: Connection with the database

In order to access the database we want to work with, it is necessary to define the credentials using the following code in a MATLAB script:

Then, still in the same script, use the following code to check that you were able to successfully connect to the database:

Objectives

Fill in the code containing the virtual machine and database parameters, and confirm that you receive the message 'InfluxDB is OK' in the MATLAB Online command window.

Second section: Reading measured data

Now you must perform the data reading, requesting the InfluxDB database to return the last minute of recorded information. The following code will serve as a template:

Objectives

Complete the names of the columns so that it is possible to receive the five statistical indicators.

The InfluxDB documentation specifies what type of request must be made to receive the data for a specific period of time, as indicated in the following example:

In the following link you can find the complete information:

https://docs.influxdata.com/influxdb/v1.8/query_language/explore-data/#the-basic-select-statement

Complete the configuration of the request (after the WHERE) so that it requests the data of the last minute of recorded information each time the script is executed. Check that it works correctly.

Third section: Data processing

In a new section you will have to generate a code to calculate the average of each of the five parameters received in the last minute of information registered and available in the database.

Objective

Verify that the code performs the data processing.

Fourth section: Write processed data

Finally, you will prepare the obtained values to be able to send them to the database. These will be stored at the same date as the last value obtained from InfluxDB to perform the calculation.

Note that to write data, you must first be connected to the database. To do so, you can use the first two codes provided in point 3.1.

The following code allows you to prepare the results. You must complete it with as many lines as results you are going to write in the database.

It is possible to preview the content that will be written to the InfluxDB database using the following code:

If the result is successful, it is possible to write it to the database using the following code:

Objective

Write the processed data to the database. Optionally, you can verify if they have been written correctly in a series called 'Results' by accessing the container that runs the InfluxDB database, as explained in the guide document of the first part of the project. In any case, you will know it in the next point when you want to show the processed data in Grafana.

Data analytics in MATLAB Online

First of all, you must download two files from Atenea:

The MATLAB script template that you will use in this part of the project.
The base workspace file that contains the mathematical models you need. Once you have accessed MATLAB Online, in another tab use the following link to upload the workspace to MATLAB Drive. Once you have done it, this workspace should appear in your MATLAB Online tab. You may have to refresh the MATLAB Online tab in order to see the workspace you have uploaded.

https://drive.matlab.com/login

As in the second part of the project, it is recommended that you respect the sections of the template provided, being separated using the characters “%%”. This will allow the code to be progressively executed to verify that it works as expected.

First section: Connect to the database

The first section of the script is used to connect to the database. Fill in the details of your InfluxDB database so that the MATLAB script is able to connect to it.

Objective

Confirm that you get the message 'InfluxDB is OK' in the MATLAB Online command window.

Second section: Reading data

In this section, a data request is made to the database with which you have connected.

Objective

Fill in the parameters of the query to read the data of the four statistical parameters (request the data recorded in the last 5 minutes).

Third section: Detection of new (without processing) data

In order to lighten the data analysis as much as possible at the computational cost level, a repeated data control has been included in this section. In this case, the script is capable of remembering which was the last data entry that it read the last time it was executed, and with that information it verifies if that data entry is within the new data read. In this case, all previous data will be considered as already processed and will be automatically discarded.

This data management implies that if you make two consecutive requests (executing the second section), the second time you run the detection of new data it is possible that all the data is considered repeated and is completely ignored. A possible solution would be not to run the third section if you need to repeat the data request for some reason, keep that in mind.

Fourth section: Data Transformation

The matrix "PCA_matrix" has been generated using principal component analysis techniques. The purpose of this matrix is to reduce the dimensionality of the vibration data, which initially has a dimension of 4 parameters, to 2 parameters (named PC1/Feature 1 and PC2/Feature 2).

Objective

Use the matrix “PCA matrix” and program the necessary matrix operation.

Fifth section: Novelty Detection

In this section a model is used, which you can find in the workspace as "model", which after being trained offline has established ranges for the two parameters that come out of the transformation of the previous section, being able to decide if these parameters represent a known situation or not.

This step prior to the diagnosis is fundamental since, if the model is not capable of associating new data to a situation that is already known to it, the conclusion that can be drawn from the diagnosis process will be totally irrelevant. In the script, MATLAB's predict function returns a numeric and a boolean value, the latter being the one that indicates whether the data is known or not.

At this point you will see how a figure is created in MATLAB similar to the following, which shows the distribution of the data and colors the data classified as known in blue, and the unknown in red.

In the code you will see that the two classes of data are separated in order to use only the known data in the diagnosis, despite the fact that at the end of the script both vectors will be written to the database again to be able to view them in Grafana and not lose information.

Sixth section: Diagnosis

In this section, a diagnosis is made with the data classified as known. At this point you will see how a mesh of points in red and blue is generated. Here, these colors define the domain in which the neural network considers a data to belong to the condition of “Healthy” (in the case of the color blue) or “Bearing Failure” (in the case of the color red). The data entries are shown as green dots in the figure that will appear, which should be similar to the following.

Additionally, the processed data is sent to the neural network to obtain its membership value, which indicates with what probability the neural network affirms that a data belongs to the "Healthy" or "Bearing failure" condition.

Seventh section: Write Diagnostic Data

In this section, a new measurement is generated where the data processed and used for the diagnosis is written.

Obective

Complete the missing data to be able to write the data resulting from the diagnosis.

Eighth section: Write Data Classified as Novelty

In this section, a new measurement is generated where the processed data is written and classified as unknown.

Obective

Complete the missing data to be able to write the unknown data.