Training professionals means training the tools that professionals use. We believe that coding courses that only teach students how to use jupyter notebooks or to only work in contrived in-browser runtime environments do their students a disservice by failing to give them fully transferable skills that they can take home or to their job.
If a pilot had only ever flown in a simulator I would not consider them “fully trained”.
At Analytics Engineers’ Club we believe that it’s absolutely critical to teach our students the tools that they’ll use on the job in a development environment that is as realistic as possible. In this blog post, I’ll talk about the custom infrastructure that we built in order to give our students that experience.
When Claire and I were first planning the course materials for Analytics Engineers Club we had lots of long discussions about what infrastructure we should use for turning data analysts into software engineers.
We faced a conundrum that many technology educators face when trying to develop technical content: use easy-to-manage but unrealistic programming environments like replit or python notebooks, or suffer through trying to get every student’s local development environment set up in a reasonable way.
The first option allows students to code but doesn’t match what they’ll need to do on the job, and the second option is a nightmare when a student with a windows machine or M1 Mac doesn’t work the same way as everyone else’s and entire classes get lost to debugging.
We wanted the following characteristics for our students’ development environment:
- Access to a real bash command line with CLI git ability
- Ability to install software from the command line (dbt, python packages, etc.)
- Persistent sessions over time (so if they install dbt in class 2, they don’t have to re-install it for class 3)
- Ability to use a real code editor for programming
- A consistent environment for every student no matter the condition or operating system of the computer they’re using for class
We couldn’t find any out-of-the box solutions that gave all 4. Repl.it is a really impressive platform, but it ticks the boxes for 1, 2, and 4 but not 3 or 4. We considered setting up a JupyterHub server which would have given us 3 and 5 but not 1, 2, or 4.
|Web IDE||Local dev||Repl.it||JupyterHub|
|1. Command line||❌||✅||✅||❌|
|2. Able to install software||❌||✅||✅||❌|
|3. Persistent sessions||❌||✅||❌||✅|
|4. Use a local code editor||❌||✅||✅||❌|
|5. Consistent environments||✅||❌||❌||✅|
We decided that in order to get infrastructure that would meet all of our requirements we’d have to build it ourselves.
If students connect to a remote machine that we set up for them, we can make sure they are all using the same infrastructure, and do some pre-configuration to make their lives easier — things like:
- making sure the
$PATHvariable is configured correctly
- pre-installing gcloud (since it’s a pretty challenging install)
Better yet, VSCode has put a lot of thought into making sure developing on a remote machine works well — we were consistently impressed by small touches like the GitHub authentication, and forwarding remote ports to local ones when serving a website (like the dbt docs site).
By combining the VSCode remote-connection feature with some custom AWS EC2 images, we were able to check all of our boxes: our students can use a real professional code editor (VSCode) but they all are using the same consistent environment that we’ve prepared for them via EC2 (getting the ssh keys set up correctly took a bit of work, but that’s a story for another blog post).
Connecting to an EC2 instance for the first time feels a little daunting, so we made sure to take extra care when writing those instructions. Once students are past that hurdle though, they can
pip install dbt without any problems whatsoever. Better yet, they can experiment as much as they like without messing up their computer — the worst-case scenario is that we need to spin up a new instance for them.
All in all, we’re really happy with where we landed — our students will be able to take everything they’ve learned and apply it directly to real computing environments they use at work. They know how to ssh into a remote server, they know how to use (or at least exit) vim, and they’re using git from the command line just like real software engineers. And that’s what’s really important.