Posted on:
Here's what happened to it: https://a79dca30.career-path-analysis.pages.dev/. Let's back up a bit, though, first, and talk about this trash research and why it's still valuable...
Weeks ago I published "Where'd our candidate pipeline go?!?!". Gemini's got your back if you need the TL;DR:
- Shrinking Candidate Pool: The author investigates a significant decrease in applicants for biochemistry graduate positions at a research lab, dropping from over 100 to around 10 after the pandemic. This prompted a small research project.
- Data Collection Method: To study this, the author scraped LinkedIn profiles of UC Davis biochemistry graduates (2018-2022) using a custom Tampermonkey script and then used a Python script with Playwright to collect screenshots of their experience and education sections. A separate script then used Google's Gemini to extract text from these screenshots.
- Future Analysis: The author plans a further analysis comparing the collected data from two different graduate cohorts (one recent, one 10 years older) to identify differences in job titles and explore potential reasons for the shrinking candidate pool.
Here we are now, with the conclusion to that first article. The real value of this post for most of you is thinking about this whole thing from the mindset of small-scale research, which is a tool I wish more of you would gain competence in.
Picking up where we left off... I got all ready to scrape from LinkedIn the second cohort of UC Davis Biochemistry graduates (from this public page: https://www.linkedin.com/school/uc-davis/people/?educationEndYear=2022&educationStartYear=2018&keywords=Biochemistry), adjusted the date range on the filters on that page, went through the slow-motion data extraction, examined the output file, and eventually noticed it was the EXACT SAME list of people as the first cohort.
That's pretty typical of LinkedIn: a lot of black box behavior when it comes to filtering, even in their paid products. On that page the keyword filter works but the date range profile sorta doesn't work. Fine. I now have a barely-useful piece of knowledge about LinkedIn's behavior and a JSON file of every UC Davis Biochemistry graduate that has shared that fact on LinkedIn and made their profile public. Onward!
I get Claude to whip up a Streamlit app for me to start looking at the data. My fundamental way of slicing the data is: two cohorts of people. The most recent cohort is those showing attendance at UC Davis in the 8 year window from 2016-2024, and an older cohort 10 years their senior (in terms of self-reported attendance at UC Davis). That's 108 people (the 1998-2006 cohort) vs. 81 people (the 2016-2024 cohort).
The first thing I do is get Claude to create two histograms of job titles, meaning: for each of the cohorts, what's the distribution of job titles for each cohort?
Then, of course, the noisyness of the data smacks me in the face. LinkedIn is user-entered data with almost no normalization, so you're always going to get this even under the best of circumstances. I spend some hours getting Claude to implement some semantic grouping of job titles to stylize the data somewhat and reduce the noisyness that makes it difficult to see patterns (if they exist) in the data.
Here's what I see after that:
I then spend an inordinate amount of time trying to get a usable Sankey diagram showing how the two cohorts progress through various possible job roles along the course of their career. I was hoping for something like https://www.nytimes.com/interactive/2019/01/26/opinion/sunday/paths-to-congress.html but, uh, it didn't work out so nicely.
LOL. Oh well, onward to what I knew would work better... pivot tables!
These aren't particularly beautiful, but when I spend a bit of time staring at these, I start to see or rather feel the vibe of some potential patterns.
I ask Claude and Deepseek both to take a look at the data (via CSV uploads). Before I tell you what we all think, a Small-Scale Research lesson:
Small-Scale Research Takeaway #1 #
Being an insider, or having skin in the game, is a good thing with Small-Scale Research. It means you have context, and context helps you make sense of information.
Thus far in this "what happened to our candidate pipeline?!?!" research, I've collected and organized information that could help answer the question. But I semi-jokingly called it trash research in the opening to this post because in my hands, it kind of is trash, no matter how well-organized the data is. I just don't have the context about the profession of biochemistry to effectively interpret this data.
Someone who does have that context would have an easy time answering questions like this:
- What is a common career path for a biochemistry graduate?
- What are high-status and low-status jobs within biochemistry?
- Where is the "real money" in biochemistry jobs?
I can only guess at what the answers to those questions might be.
One of the nice things about LLMs is they can be an easy way to quickly gather basic context I don't have. So I ask Claude and Deepseek for their interpretation of the data. After a bit of followup prompting, here's what Claude said:
Career Path Evolution Analysis: UC Davis Biochemistry Graduates (1998-2024) #
Key Conclusion #
The career trajectories of UC Davis biochemistry graduates have evolved from traditional linear academic/research paths to more diverse, flexible career pathways, reflecting both market demands and changing professional preferences.
Supporting Arguments #
1. Increased Career Path Diversification #
-
Traditional Focus (1998-2006 Cohort)
- Concentrated in research roles (Research Associate, Laboratory Technician)
- Clear hierarchical progression (Graduate Research Assistant → Postdoctoral Researcher → Research Scientist)
- Strong presence in academic positions (Assistant Professor, Research Assistant Professor)
-
Modern Diversity (2016-2024 Cohort)
- Broader range of entry positions (including non-research roles)
- Multiple concurrent roles and internships
- Integration of teaching, clinical, and industry positions
- Evidence:
- Entry positions include Medical Staff Assistant, Tennis Coach (not present in early cohort)
- Higher proportion of teaching/mentoring roles
- More varied technical specializations
2. Shift in Professional Development Patterns #
-
Early Cohort Pattern
- Linear progression through research ranks
- Position 1: Graduate Research Assistant (15.6%)
- Position 2: Research Associate I (16.7%)
- Position 3: Postdoctoral Researcher (14.9%)
- Position 4: Research Assistant Professor (5.9%)
- Position 5: Senior Research Associate (15.1%)
- Linear progression through research ranks
-
Recent Cohort Pattern
- More flexible, multi-track development
- Multiple concurrent internships
- Combination of research and teaching roles
- Higher proportion of industry-oriented positions
- Evidence of skill diversification through varied role types
- More flexible, multi-track development
3. Changes in Institutional Relationships #
-
Traditional Model (Early Cohort)
- Long-term institutional commitments
- Clear departmental hierarchies
- Focus on single-track specialization
- Evidence: Higher proportion of senior academic positions
-
Modern Approach (Recent Cohort)
- More fluid institutional relationships
- Multiple concurrent affiliations
- Emphasis on cross-functional roles
- Evidence: Higher frequency of internships, temporary positions, and cross-sector roles
Implications #
For Educational Institutions #
- Need for broader career preparation programs
- Importance of industry partnerships
- Value of flexible degree programs
For Students #
- Benefit of diverse skill development
- Importance of adaptable career planning
- Value of cross-functional experience
For Industry #
- Opportunity for early career partnerships
- Need for flexible career development paths
- Value of academic-industry collaboration
Methodology Notes #
- Analysis based on career progression data from two cohorts:
- Early Cohort: 1998-2006 (n=108)
- Recent Cohort: 2016-2024 (n=81)
- Career positions categorized into 5 progressive levels
- Data includes job titles, career stage, and temporal progression
That's also the same basic feeling I got from staring at the data for a bit. It seems the older cohort had less variance in their career paths. Sort of like there was a single "valley" of normalcy that more of them flowed down into, and the younger cohort moved on from graduation into a landscape with more smaller varied valleys they could flow into.
But again, I don't really have the context needed to interpret this data effectively!
Small-Scale Research Takeaway #2 #
Research does not exist independent of the audience for the research. The audience's priorities, incentives, uncertainties, and emotional immaturity are all critical parts of how any research is consumed.
You'll remember that a new friend's employer has a much-diminished pipeline of job applicants. They don't know why.
After consuming this research, they probably still won't know exactly why their applicant pool got so much smaller, but they might have at least one test-able hypothesis. And that is a huge improvement! It's not an "answer", but it's a reduction in uncertainty; it's a direction to explore that's better than wild guessing or asking the senior person "what do you think we should do?".
Again, I'm lacking important context here, but here's one path that they might take from this data to a test-able hypothesis:
- Look at the kinds of jobs that are most common among the recent graduates cohort.
- Imagine those jobs and the companies they're at as competitors for a scarce resource.
- Think about what they might need to do to compete better against that competition. (ex: maybe the competition seems like it offers a way to have more impact in the world; my friend's company might try playing up that angle in their job ad or, gasp, actually marketing themselves as an impactful place to work outside the job ad itself.)
- Try some small experiments guided by #3 above.
Or, alternately, they could use this shabby little dataset as a starting point for deeper research. As you know if you've read this far, I wrote some code to scrape LinkedIn. But it wouldn't take a human much more than a few hours to:
- Set up a free account on voiceform.com, put together a 2 to 4-question survey that allows voice response. Ask really open-ended questions like "As you were nearing graduation from
, how did you gather information about employment opportunities?" and "How did you choose the job you ended up taking after graduation?". - Secure a $500 budget for the research. Set up an account on tremendous.com. Load that half a g-stack into Tremendous.
- Start connecting with and messaging folks on LinkedIn, offering them $30 for 10 minutes of their time if they'll complete your survey. Automate and multi-challenei-ize the outreach with a $100/mo tool like Lemlist if needed.
I could spend many thousands of words pointing out the flaws in the methodology of all of the above. However the goal of small-scale research is not methodological perfection, the goal is uncertainty reduction. So as long as the methodological shortcomings do not translate into highly flawed conclusions, we're fine. And generally, small-scale research doesn't really lead to a. conclusion. It leads to refined understanding or reduced uncertainty, which should lead to action, new questions, and new research initiatives.
Here again is the link to the Streamlit app I put together (modified a bit to run in-browser): https://a79dca30.career-path-analysis.pages.dev/. (This sucker's heavy; it's running code inside your browser to let you play around with the data visualizations (thanks stlite!), so give it a minute to load.) If I'd had full-day stretches to work on this, I could have got from start to finish in 3 days. My friend https://www.linkedin.com/in/zelmanow/ would be proud (I hope) of the speed of execution. :)
This is but one example of Small-Scale Research. Sometimes it's more qualitative in nature. But it's always about reducing uncertainty around an important business question, or a question your clients struggle with. If you'd like my help with a SSR project, ranging from coaching to done-for-you, I'm a mere email or LinkedIn message away.
Tagged with:
More posts:
- Next: You can never be sure of
- Previous: I actually like X née Twitter.