February 28, 2020
We’re participating in the autonomous greenhouse challenge: growing tomatoes without entering the greenhouse. We’re managing the greenhouse from behind our laptops, and have to guide our decisions based on the real-time data we receive from the indoor climate, outside conditions and weather forecasts. To be able to do this we’ve been developing multiple machine learning applications. These applications guide the cultivation strategy and subsequently, the actions taken to reach the desired climate.
Machine Learning Challenges
However, there are many challenges in developing and operationalising large scale machine learning applications. One reason is the inherent nature of machine learning. Data are ever-evolving and models are stochastic, which means you have no certainty about what will happen in advance.
In software engineering, code is version controlled to manage changes over time (i.e. the numbered software updates of your smartphone). In machine learning, there are no standardised solutions to manage changes in code, data and model characteristics at the same time. And this is largely due to the (im)maturity of the field. There are many initiatives trying to solve this problem, for example, MLflow and Data Version Control (DVC), but these have their own limitations which are out the scope of this blog.
AWS Project & Solutions
To solve some of these problems we’ve been fortunate to receive the help of two machine learning engineers from Amazon Web Services (or AWS). AWS is a cloud provider, and we’re using their services to host – among others – servers, database and machine learning models. As a company, 30MHz has been closely working together with AWS for quite some years. For this reason, and because they’re excited about our work, we had the great opportunity to learn from and work with their engineers at our own office in Amsterdam for more than two weeks.
The goals of the project were twofold:
- Isolate the machine learning process for every grower/customer of 30MHz. We’ve been developing a solution that enables us to automatically train machine learning models if new data is collected for – potentially – thousands of growers. This entire model training process is isolated for every single grower. This solution kills two birds with one stone. First, we’re adhering to our customers wishes to use their data solely for their own needs. For they are owners of the data and therefore decide what happens with it. Second, we’re able to learn unique growing patterns that are specific to each grower.
- Develop a framework to support machine learning at scale. As displayed in the image below, machine learning code constitutes only a fraction of the entire solution. Together with AWS we’ve been developing and automating many of these components. This provides us a couple of advantages. First, it improves the quality and reliability of our machine learning applications because we track and monitor everything closely. And second, we’re able to accelerate future projects with this framework because many of the components in the framework are reusable.
Improve and automate
With their knowledge and experience we’ve been able to improve and automate a large part of our machine learning infrastructure. The result is a scalable and robust framework for machine learning applications on the 30MHz platform. So in case you have exciting ideas or requirements that involve machine learning, feel free to reach out.