Mining Mayhem: Analysis of Mining Data Using Python
- Scott C
- May 24, 2023
- 3 min read
Updated: Jun 3, 2023

Introduction:
Mining is a dangerous job and it can be important to look into the metrics to make sure there are no issues going on. For this analysis, a new data analyst has been hired for a fictional mining company called Metals R' Us. The data comes from a flotation plant. Mining involves the collection of soil and the purpose of the flotation plant is to take that soil and filter out what is needed. In this case, the purpose is to filter out iron from the rest of the soil. The focus in this analysis is concerns from the boss of an event that occurred on June 1, 2017. The goal is to figure out any possible explanations for the issue and if there is any of concern.
More information regarding this process can be found in the video here.
Overview:
Looking at the visualizations shows no correlation between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and Flotation Column Level.
The data does appear to show a sudden drop off during June 1, 2017
Data:
The data used was collected from Kaggle. The data was collected from a flotation plant from March 2017 to September 2017. The data focused on a few key components.

%Iron Concentrate-% iron at end of flotation process
% Silica Concentrate-% silica at end of flotation process
Ore Pulp pH-pH scaled from 0-14, can indicate if conditions can lead to chemical reactions
Flotation Column 05 Level-this provides the heights of the floats, the lower the level, the higher the grade of concentration
Analysis:
The data was python coded and analyzed using DeepNote.
Housekeeping:
The initial coding was to provide some basic information before running the analysis.
The following packages were added to Python to assist with analysis:
Pandas for data manipulation and analysis
Matplotlib and Seaborn for data visualization

The next step was connecting Python to the data. The data was then previewed.


The code df.shape was used to check out the number of rows and columns.

df['% Iron Concentrate'] provided just the % Iron Concentrate column

df.iloc allowed specific rows to be included. In this case rows 100-104.

Next df is defined as the date and converted to the proper date variable.

Onto the data!
The boss initially wants summary statistics of the information. In this case df.describe is used to get this information.

The boss wants to investigate issues that happened during June 1, 2017. The following coding below was used to pinpoint data from June.


Then, additional coding was used just to pinpoint on the specific variables of % iron, % silica, pulp pH, and column level. Based on looking at the data, there doesn't appear to be anything specific standing out during this period.

A pairplot was created using seaborn in order to view multiple graphs and relationships. Looking at the plots, there doesn't appear to be any clear linear patterns.


In order to confirm these results, corr was used to create a correlation matrix. The values shown below are low, indicating no clear correlation between the variables.

Line charts were then made to see if there were any additional changes that occurred. Line graphs were created with seaborn with each variables showing their changes throughout the day.

Looking at the first graph, there does seem to be a more steady decrease around June 1st for the % iron concentrate.

This drop is also seen for the % silica concentrate.

The drop is extremely noticeable when looking at the pH.

The clearest example of seeing a drop during the first is the flotation column level. This is a clear indication of when the odd event occurred.
Key Takeaways:
There does not seem to be a clear correlation between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and Flotation Column Level.
During the date of June 1, 2017, there is a clear drop for all variables, especially that of the flotation column level.
Conclusion:
Although there appears to be an odd occurrence on June 1, 2017, none of the factors appear to be related. The important topics to discuss with the boss would be to first make sure that besides June 1st, if the patterns are consistent compared to other dates. It would also be important to see about looking at all of the possible variables and compare them to each other.



Comments