In our AEC course, we spend the first week on command-line basics, the second on using git via the command line, and the last two weeks learning Python. Sometimes, students are confused by this. “Doesn’t being an analytics engineer just mean using dbt?” they ask. Our answer is, resoundingly, “no”. In fact, we think that limiting the scope of analytics engineering work to just “developing with dbt” is severely limiting and sets neither data teams nor analytics engineers up for success in the long run.
I wrote one of the very first essays about analytics engineering, and for me the role has always been about building tools that make a data team operate more efficiently. And yes, designing a useful and legible data warehouse schema is a great way to make analysts more efficient, but the work shouldn’t end there. In my mind, analysts can write SQL that can be run by dbt, but analytics engineers can actually make changes to dbt itself. So, analytics engineers should know Python—and not just for data analysis and visualization.
In fact, in our AEC training course, there are a few notable things we don’t teach:
- We don’t teach jupyter notebooks
- We don’t teach pandas or numpy
- We don’t teach any type of visualization or charting library
Instead, we have students running Python 3 in a Linux virtual machine via the command line while developing in VS Code. We focus on what we believe are key skills and patterns for an engineer:
- Managing environments with package consistency
- Debugging techniques including pdb
- Writing and running unit tests
In just two weeks, students are able to write a command-line tool in Python, complete with unit tests. This gives them the tools to be able to make a change to dbt core (or any open source Python package), add unit tests, and submit a pull request.
We believe it’s important for analytics engineers to think beyond the data warehouse. We want to see analytics engineers collaborating with data scientists or machine learning engineers to build Python packages for internal use: maybe creating a command line tool to monitor the state of long-running model training jobs, or working with analysts to build them a tool to easily upload spreadsheets into the data warehouse, or debugging the failed ingestion job, or any number of use cases that are commonplace in data teams.
Beyond the software applications these analytics engineers might build, we believe that it’s important for analytics engineers to have a baseline software engineering skill set that allows them to learn from other software engineers and participate in the broader software engineering dialogue and culture. If analytics engineers relegate themselves to being just “dbt developers”, we believe it will be a huge loss for the industry.