in Uncategorized

I recently took a brief look into Maine’s labor demographics as part of a personal project. My family has grown up in Maine, all my siblings have gone to college there, and it holds a significant place in my heart and head. I understand, however, that there’s an enormous disconnect between the fantasy of Maine being a continual vacationland and the reality of living there. After graduating from college, Maine students are faced with a harsh reality. A significant amount of jobs in the state are low-paying, not very rewarding, and don’t require many qualifications. If you want to get a job in a restaurant or at a hotel on the coast, Maine is the place to be. If not, good luck as a new college grad with a minimal skill set. There’s certainly many awesome jobs in the state, but it seems like they are incredibly scarce. At any rate, this reality is extremely depressing for recent grads with thousands of dollars of student loans, and unideal ways of paying this debt back. It is understandable why students graduate from schools in Maine and head elsewhere seeking employment.


With this being said, this is simply my personal bias towards the state of Maine, and I am okay with being proven wrong. The most effective way to prove otherwise? Data. Through gathering, restructuring, and analyzing data, I wanted to figure out whether Maine’s labor landscape has significantly changed over the past 10 years, whether my worldview has changed, or both. Looking at trends in labor demographics over the past 13 years seemed like a natural fit.


The first step of analyzing Maine’s labor trends was coming up with a list of pre-analysis assumptions, or a very informal hypothesis. The focus was simply on labor demographic trends, and the analysis was to be focused on quantitative aspects of the labor economy- not qualitative. The quality of jobs could be steadily increasing or decreasing, but for the sake of clarity, I chose to ignore this aspect. 


Pre-Analysis Assumptions:

  • Employment in Maine is decreasing as a whole
  • The labor force participation within younger age groups has been steadily decreasing
  • A large majority of students are moving out of state to find employment
  • More older people (65+) are working until they die, rather than retiring and enjoying life


The next step of the process was choosing a kit of technical tools. I have external motivation to learn SQL syntax, database design, and various ETL functions, so MySQL had to be a part of the system. There’s some great data manipulation and analysis modules for Python, so all extraction and transformational logic was to be executed via Python scripts. Lastly, the question of how to use Python to load data into a MySQL database was addressed with a simple Google search. Executing bash commands with Python is fairly straight-forward, but I was really hoping for a module to work with (read “I’m lazy and don’t want to type more SQL queries than I have to”). The MySQL/Python connector was the answer.






After downloading the initial XLS file from the Maine Department of Labor, the next step was to design and create an effective database. I created separate tables with their respective columns, established primary key/ foreign key relationships, and populated the data. A Python script was used to extract data from the XLS file, transform it into pertinent Python data structures, and load it into the database using the Python/MySQL connector.


Screen Shot 2014-08-05 at 8.00.44 PM


Why go through all this trouble when the data was already in XLS format, and could easily be graphed?! At first glance, this would be the sensible thing to do- especially since graphing in Excel is so incredibly easy. The justification for designing a relational database and populating it with data, however, is a much more valid argument. First: the magic of a RDBMS is its ability to perform relational queries. One can quickly speed through a massive database, grabbing only the information and relationships he or she wants. For example, I can query the database and pull out labor numbers between the years 2005 and 2010, filtered by males older than 65. Doing this same query in Excel would be a long, tedious process of going through an entire spreadsheet and processing each line separately. Secondly, as touched on in the previous sentence, the time difference of doing these queries is extremely significant. Specific Excel queries, when not using built-in functions, are on the order of linear time. You have to mentally iterate, line by line, over the entire document. You then have to collect, sort, and assign relationships within the data. With a relational database, however, this query can be significantly sped up. If I wanted to perform a bunch of queries on the Maine labor force Excel sheet, it would be highly inefficient and could possibly take weeks. Spending a few days learning database design and how the MySQL connector works was a significantly more efficient option.


So… what was learned? What insight was gained?


I had little prior experience with MySQL (other than through ActiveRecord in Rails), so I learned a significant amount about database/ table design, establishing relationships, SQL queries, etc. Secondly, I learned how to use the MySql/Python Connector, and how simple it is to populate and query databases via Python scripts. Lastly, I learned how to build plots using the amazing



First of all, I learned that my pre-analysis assumption of Maine’s decreasing workforce is incorrect. Since 2010, the workforce has steadily been increasing. Why? It could be attributed to multiple issues, including a recovering national economy, unemployed spouses are going back to work due to family financial pressures, effective state legislative decisions, etc. I’m not going to get into a correlation/causality argument; employment figures are increasing. 



Secondly, after analyzing the data for male vs. female employment, some interesting trends can be observed. As expected, the 2008 recession had a detrimental impact on the male workforce. The female workforce, however, has been increasing since 2008. Again, this trend could be attributed to women going back to work or being forced to work by family financial pressures. It could also signify large-scale openings in female-dominated industries during these years. This chart doesn’t include enough data to thoroughly answer these questions, but it certainly initiates discussion and thought.



Third, just as I suspected, the labor force between ages 16-24 is A. very volatile, and B. digging itself out of a downswing. It appears more individuals in this age group are working than in 2012, but not nearly as many are working as in 2006. What is the reasoning behind this, and why was the 16-24 category hit so hard by 2008? Among many other reasons, I suspect that low-skilled jobs were one of the first things cut by struggling businesses, and a lack of experience posed a serious problem for finding new employment.



Lastly, as discussed at the beginning of this post, individuals in the 65+ age category are, in fact, working longer into their lives. This could be caused by many factors such as increasing healthcare costs, needing to support younger family members, their employers not being able to find help with the same qualifications, etc. Interesting! 




Side note: I also learned that there are many discrepancies in Maine’s published employment demographic data. I seriously hope this was caused by rounding issues and dropping significant figures; not by technical incompetencies. Sorry, but (16+33+63+69+82+78+23)*1000 does not equal 363,000

Screen Shot 2014-08-07 at 11.07.18 PM