Last week, I gave a guest lecture at NYU School of Engineering to Financial Engineering students on a Data Science topic. The lecture covered Unsupervised Learning techniques: PCA, K-means and Hierarchical clustering. I started the lecture by discussing the breadth of potential career paths in Finance. On the train ride home after the lecture, it occurred to me that I could use a clustering algorithm to build a map of the roles within a Bank. I think this would be helpful to anyone because there are universes amongst universes inside these large Banks; only select gray haired veterans have a true understanding of. Wouldn’t it be nice if we could draw a diagram showing that?
The most obtainable data on jobs that I could think of are the descriptions themselves. The goal would be to relate the descriptions, a natural language processing (NLP) problem. However, the first issue getting the job descriptions to create a first pass/proof of concept. Using monster.com’s “rss” feed, I was able to get about 15 descriptions for 4 separate job searches: ‘financial analyst’, ‘accounting manager’, ‘web developer’, ‘pharmaceutical sales’).
Much more work is required to take this seriously. Specifically: 1) more data 2) optimize vectorization thresholds 3) scrub/vet data appropriately 4) pick a good distance measure.