This app uses Apache Flink for automated data processing and monitoring of a designated directory to read new files every 10 seconds. It analyzes fruit data and stores the results in a MySQL database. This tool streamlines agricultural data analysis, reducing manual effort and time. Prometheus and Grafana are used for logging and monitoring.
- Real-time monitoring of fruit quality and maturity
- Automated counting and classification
- Database integration for persistent record-keeping
- Overview
- Requirements
- Installation
- Setup Configuration
- Running the Application
- Running the Examples
- Running the Experiments
- Troubleshooting Common Issues
Before installing the Fruit Stream Processing app, ensure your system meets the following requirements.
- Flink:
- A UNIX-like systems, including Linux, Mac OS X, and Cygwin or WSL (for Windows).
- Java 11 - Verify your Java installation
by executing
java -version
in your command prompt. - Apache Flink.
- MySQL:
- MySQL and its server installed and running on your system.
- Clone the Repository: Clone the Fruit Stream Processing application repository to your system.
git clone https://github.com/eleadufresne/fruit-stream-processor.git
- Flink Setup: If you haven't installed Apache Flink, follow this tutorial to get started. You might need to install the latest Hadoop binaries.
- MySQL Setup: Start MySQL server on your system. If it's not installed, run the following commands
before proceeding.
# install MySQL sudo apt-get update sudo apt install mysql-server # ensure the MySQL server has been started sudo service mysql status # [optional] start the server if it is not running correctly sudo service mysql restart
With these installations and setups complete, you're ready to configure and run the Fruit Stream Processing application.
Please keep in mind the following steps when setting up WSL. If you're using a different environment, you may need to make some additional configurations.
-
Environment Variable: As a first step, let's create an environment variable for the flink installation directory. The command should look something like that.
export FLINK_HOME=/mnt/c/projects/apache/flink-1.18.0
Optionally, you may set up an environment variable for this repository. Otherwise replace
$FRUIT_DIR
with the actual path in the following steps.export FRUIT_DIR=/mnt/c/projects/fruit-stream-processor
-
Permissions: Ensure that you have enough permissions to use the repository.
chmod -R 777 $FLINK_HOME
-
Configuration File: Navigate to the
conf
folder of your Flink directory and open theflink-conf.yaml
file.-
Setup the Logs: Add the following line to the end of the file:
taskmanager.resource-id: stream-processing-logs
Flink will create a temporary folder with the given name to maintain its state logs as it sets up a cluster. These logs can also be accessed in the web UI under the task manager tab.
-
Setup the Connection: Update the following values in
flink-conf.yaml
.rest.port: 8081 rest.address: localhost rest.bind-adress: 0.0.0.0
This should allow you to access Flink via your Windows 'localhost' rather than the IP of your WSL instance.
-
-
Setup the Database: To run the app, you must first create database with appropriate schema to store the output.
# start the MySQL server sudo service mysql status # log into MySQL as root sudo mysql -u root # create a new db user CREATE USER 'fruit_enthusiast'@'localhost' IDENTIFIED BY 'Fru!t5'; GRANT ALL PRIVILEGES ON *.* TO 'fruit_enthusiast'@'localhost'; FLUSH PRIVILEGES; # log out of MySQL quit # log in as the newly created user, and type in the password mysql -u fruit_enthusiast -p # create a new database CREATE DATABASE fruits; # create a table inside the database USE fruits CREATE TABLE data ( feature VARCHAR(150) NOT NULL, # ripe, rotten, etc. count INT unsigned NOT NULL, # number of fruit with this feature PRIMARY KEY (feature) # make the feature the primary key );
-
[Optional] Faster WSL Network: Modify your
.wslconfig
file so it contains the following lines.[wsl2] nestedVirtualization=true memory=32GB vmSwitch = LAN experimental.networkingMode=mirrored localhostforwarding=true [boot] systemd=true
-
[Optional] Performance Monitoring Dashboards Although Flink provides a dashboard with some useful metrics, it might be worth installing Prometheus and Grafana if you are looking for better data visualization. Keep in mind that if you decide not to install them, you will need to comment out the relevant lines in the experiment script.
# install prometheus sudo apt-get install prometheus # install Grafana sudo apt-get install grafana # start the Grafana server sudo service grafana-server start
Open a browser on your Windows machine and navigate to localhost:3000. Log in using the default credentials (admin/admin). Create a new dashboard for this experiment by uploading the
json
template we provide under the config folder. We also provide a Flink config that enables Prometheus, and opens up ports for it. In order to use this reporter you must copy/opt/flink-metrics-prometheus-1.18.0.jar
into the/lib
folder of your Flink distribution
By default, the job monitors files from orchard-watch
(provided under util
) in the working directory. Before
proceeding, either place orchard-watch
to your working directory or provide paths as argument. Now, from the root
of your UNIX-like system, execute the following commands.
# start a cluster
$FLINK_HOME/bin/start-cluster.sh
# generate a JAR file for the Flink job
cd $FRUIT_DIR && mvn clean package && cd ~
# start monitoring the directory
$FLINK_HOME/bin/flink run $FRUIT_DIR/tar*/fruit*.jar <optional-arguments>
Run $FLINK_HOME/bin/flink run $FRUIT_DIR/tar*/fruit*.jar --help
to get the list of arguments.
To access the Flink dashboard:
-
Standard Access:
- Navigate to localhost:8081 in your web browser. This is the default address for the Flink dashboard.
-
Accessing from WSL:
- If you are using WSL, and
localhost
does not work, you may need to use the IP address of your WSL instance. - Find the IP address by running
hostname -I
in your WSL terminal. - Then, access the dashboard via http://:8081 in your browser.
- If you are using WSL, and
If you encounter any issues accessing the dashboard from WSL, please refer to the Connection Issues on WSL section for further troubleshooting.
When you are done, stop the cluster with$FLINK_HOME/bin/stop-cluster.sh
and shut down the MySQL server with
mysqladmin --user root shutdown
.
The program features two modes of execution, one for processing images and another for text files. To run both, simply
execute the run_examples.sh
file. The output can be found under the ./orchard-watch/output
directory.
Make sure that you have correctly configured the environment before proceeding further. We have provided a script that enables you to execute all experiments and evaluate performance on a large input, throughput, latency, and bandwidth.
To execute the experiments, you need to launch the "run_experiments.sh" script with the option 0. This will initiate all
the experiments without having to build the JAR file. You can also perform individual experiments that are stored in
the script
directory.
Once you have launched the script, you can open the Flink dashboard on your web browser atlocalhost:8081. To access Prometheus, go to localhost:9090, and for Grafana, visit localhost:3000.
In terms of monitoring, we are interested in the following metrics:
numRecordsOutPerSecond
- Number of records this task/operator sends per second. (throughput)latency
- Latency from the source operator to this operator.restartingTime
- The time it took to restart the job, or how long the current restart has been in progress.
If you come across a java.nio.file.FileSystemException
, it may indicate that the task-executor failed to stop properly.
To tackle this issue, simply run the jsp
command in your UNIX-like terminal. This will display the running processes.
Keep executing $FLINK_HOME/bin/stop-cluster.sh
until all standalone-sessions and task-manager instances are terminated.
Alternatively, you may use the Windows Resource Monitor to find out which processes are using the file and terminate them manually.
- Open the Task Manager (Ctrl + Shift + Esc).
- Go to the "Performance" tab.
- Click on "Open Resource Monitor" at the bottom.
- In the Resource Monitor, navigate to the "CPU" tab.
- Expand the "Associated Handles" section.
- Type in the name of the file you're looking for in the search box.
- Terminate any process that is using the file.
If you're using WSL, you might need to add a rule to open up ports for Flink since WSL runs on its own IP address. In the Windows PowerShell, execute the following to create a new rule.
# open ports 1000-60000 (feels free to make it stricter)
New-NetFireWallRule -Profile Private -DisplayName 'Open ports for Flink on WSL'
-Direction Inbound -LocalPort 1000-65000 -Action Allow -Protocol TCP
# restart wsl to apply changes
wsl --shutdown
wsl
In your WSL terminal (e.g. Ubuntu), you may open all -or specific- ports using the following commands.
# allow all incoming traffic
sudo ufw default allow incoming
# allow all outgoing traffic
sudo ufw default allow outgoing
# allosw all tcp ports
sudo ufw allow proto tcp to any port 1:65535
# allow the SSH port 22
sudo ufw allow ssh
# apply the changes
sudo ufw reload
# enable the firewall
sudo ufw enable
# check the status
sudo ufw status verbose
You may also need to set up SSH if it's not already done.
# install SSH server
sudo apt update
sudo apt install openssh-server
# start SSH service
sudo systemctl start ssh
sudo systemctl enable ssh
# check SSH service status
sudo systemctl status ssh
You can edit the configuration file located at /etc/ssh/sshd_config
if necessary. Restart the SSH service to
apply the changes with sudo systemctl restart ssh
.