Hello and welcome to another data analysis with Python and Pandas tutorial. In this tutorial, we're going to change up the dataset and play with minimum wage data now.

You can find this dataset here: Kaggle Minimum Wage by State. This dataset goes from 1968 to 2017, giving the minimum wage (lowest amount of money that employers can pay workers by the hour), by state.

Description of the data:

Year: Year of data

State: State/Territory of data

Table_Data: The scraped, unclean data from the US Department of Labor.

Footnote: The footnote associated with Table_Data, provided by the US Department of Labor.

High.Value: As there were some values in Table_Data that had multiple values (usually associated with footnotes), this is the higher of the two values in the table. It could be useful for viewing the proposed minimum wage, because in most cases, the higher value meant that all persons protected under minimum wage laws eventually had minimum wage set at that value.

Low.Value: This is the same as High.Value, but has the lower of the two values. This could be useful for viewing the effective minimum wage at the year of setting the minimum wage, as peoples protected under such minimum wage laws made that value during that year (although, in most cases, they had a higher minimum wage after that year).

CPI.Average: This is the average Consumer Price Index associated with that year. It was used to calculate 2018-equivalent values.

High.2018: This is the 2018-equivalent dollars for High.Value.

Low.2018: This is the 2018-equivalent dollars for Low.Value.

Once you have downloaded the data, let's begin working with it.

import pandas as pd

# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte
df = pd.read_csv("datasets/Minimum Wage Data.csv", encoding="latin")

Right away, we've got some encoding issues. Looks like the user saved the formatting funky-like. Because the data was grabbed from the internet, it would have made more sense to leave it in UTF-8, but, for whatever reason, that wasn't the case, and I initially hit an encoding error on loading it in. I tried latin encoding next, and boom, there we go. Now, let's go ahead and just save our own version, with utf-8 encoding!

df.to_csv("datasets/minwage.csv", encoding="utf-8")

df = pd.read_csv("datasets/minwage.csv")

df.head()

Let's check out a new functionality with pandas, called group by. We can automatically create groups by unique column values. Sounds familiar? It's exactly what we did before, just with pandas instead of our own Python logic. That's one thing I really enjoy with Pandas. It's very easy to work with Pandas using your own logic, or with some built-in Pandas logic.

gb = df.groupby("State")
gb.get_group("Alabama").set_index("Year").head()

Aside from getting groups, we can also just iterate over the groups:

act_min_wage = pd.DataFrame()

for name, group in df.groupby("State"):
    if act_min_wage.empty:
        act_min_wage = group.set_index("Year")[["Low.2018"]].rename(columns={"Low.2018":name})
    else:
        act_min_wage = act_min_wage.join(group.set_index("Year")[["Low.2018"]].rename(columns={"Low.2018":name}))

act_min_wage.head()

	Unnamed: 0	Year	State	Table_Data	Footnote	High.Value	Low.Value	CPI.Average	High.2018	Low.2018
0	0	1968	Alabama	...	NaN	0.00000	0.00000	34.783333	0.00	0.00
1	1	1968	Alaska	2.10	NaN	2.10000	2.10000	34.783333	15.12	15.12
2	2	1968	Arizona	18.72 - 26.40/wk(b)	(b)	0.66000	0.46800	34.783333	4.75	3.37
3	3	1968	Arkansas	1.25/day(b)	(b)	0.15625	0.15625	34.783333	1.12	1.12
4	4	1968	California	1.65(b)	(b)	1.65000	1.65000	34.783333	11.88	11.88

	Unnamed: 0	State	Table_Data	Footnote	High.Value	Low.Value	CPI.Average	High.2018	Low.2018
Year
1968	0	Alabama	...	NaN	0.0	0.0	34.783333	0.0	0.0
1969	55	Alabama	...	NaN	0.0	0.0	36.683333	0.0	0.0
1970	110	Alabama	...	NaN	0.0	0.0	38.825000	0.0	0.0
1971	165	Alabama	...	NaN	0.0	0.0	40.491667	0.0	0.0
1972	220	Alabama	...	NaN	0.0	0.0	41.816667	0.0	0.0

	Alabama	Alaska	Arizona	Arkansas	California	Colorado	Connecticut	Delaware	District of Columbia	Federal (FLSA)	...	Tennessee	Texas	U.S. Virgin Islands	Utah	Vermont	Virginia	Washington	West Virginia	Wisconsin	Wyoming
Year
1968	0.0	15.12	3.37	1.12	11.88	7.20	10.08	9.00	9.00

Groupby - Data Analysis with Python 3 and Pandas