Python

Python is a popular high-level, general-purpose object-oriented programming language. It enjoys broad support across all common operating environments. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured, object-oriented and functional programming.

Why do we use it: Python has become an ecosystem of choice for the data science community. By standardizing on Python we are assured open-source access to the most important 3rd party code libraries like Matplotlib, Sckikit-learn, Pandas, SciPy, NumPy, Seaborn and Keras.

NodeJS

Node.js is a cross-platform, open-source JavaScript runtime environment that can run on Windows, Linux, Unix, macOS, and more. Node.js runs on the V8 JavaScript engine, and executes JavaScript code outside a web browser. Node.js lets developers use JavaScript to write command line tools and for server-side scripting.

Why do we use it: We’re less interested in NodeJS itself as opposed to its package manager NPM, which contains supporting component libraries for our React-based chat front-end as well as for code quality tools that we use in this repo.

Django

Django is a free and open-source, Python-based web framework that runs on a web server. It follows the model–template–views architectural pattern. It is community supported and maintained by the Django Software Foundation.

Why do we use it: Aside from being a fantastic web framework, Django is arguably the best way to integrate Python to sophisticated backing services like MySQL, Redis and Celery. Django’s Object Relational Model is unparalled in this regard.

Django REST Framework

Django REST framework is a powerful and flexible toolkit for building Web APIs. Some reasons you might want to use REST framework: The Web browsable API is a huge usability win for your developers.

Why do we use it: given that we’re standardized on Python-Django, Django REST framework is the gold standard for implementing sophisticated REST APIs.

Django Waffle

Waffle is feature flipper for Django. You can define the conditions for which a flag should be active, and use it in a number of ways.

Pydantic

Pydantic is a Python package that can offer simple data validation and manipulation. It was developed to improve the data validation process for developers. Indeed, Pydantic is an API for defining and validating data that is flexible and easy to use, and it integrates seamlessly with Python’s data structures.

Why do we use it: validating data is harder than it seems, and it’s thankless work. Pydantic does this WAY better than we ever could on our own.

Redis

Redis is a source available in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability.

Why do we use it: a couple of reasons actually. First, in order to run Python at scale you need an effictive object caching strategy, which becomes surprisingly complex early on. Redis does this well, and so it’s common to see Python and Redis paired in large deployments. Additionally, Redis works well as a message broker. See ‘Celery’ below for details.

Celery

Celery is an open source asynchronous task queue or job queue which is based on distributed message passing. While it supports scheduling, its focus is on operations in real time.

Why do we use it: Celery is an excellent 3rd party library for managing asynchronous and distributed tasks, which is more complicated to manage than might seem. Smarter primarily leverages celery for generating and persisting customer billing data real-time.

ReactJS

React is a free and open-source front-end JavaScript library for building user interfaces based on components. It is maintained by Meta and a community of individual developers and companies. React can be used to develop single-page, mobile, or server-rendered applications with frameworks like Django REST framework.

Why do we use it: all modern chat applications (think for example, whatsapp web) use front-end javascript frameworks that provide a means of manipulating the DOM without rerendering the page on each change. Our chat app needs to measure up to this pretty high bar. Of those frameworks, React is arguably the most popular.

Cloud Infrastructure Technologies

Terraform

Terraform is an infrastructure-as-code software tool created by HashiCorp. Users define and provide data center infrastructure using a declarative configuration language known as HashiCorp Configuration Language, or optionally JSON.

Why do we use it: Setting up the cloud infrastructure for a serverless, horizontally scalable platform like Smarter is surprisingly complicated. And in our case we manage at least three fully-functional environments that run side-by-side. We’re highly reliant on Terraform to give us the means of confidently managing this much complexity.

Docker

Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers. The service has both free and premium tiers. The software that hosts the containers is called Docker Engine. It was first released in 2013 and is developed by Docker, Inc.

Why do we use it: The Smarter project consists of around 3 million lines of code written in dozens of languages. Only around 20,000 lines of this codebase were written by us. Building the application involves consistently pulling in several hundred interdependent 3rd party code libraries, consolidating thousands of web assets, and configuring all of this so that it runs reliably. Moreover, some of this code behaves differently depending on the os environmetn and filesystem in which it runs. Docker provides a means managing this level of complexity in a simple, declarative way that for the most part is platform agnostic. We use Docker for local development as well as for deploying to Kubernetes on AWS.

AWS Elastic Container Registry

Amazon Elastic Container Registry (Amazon ECR) is a fully managed container registry offering high-performance hosting, so you can reliably deploy application images and artifacts anywhere.

Why do we use it: ECR is a private alternative to Docker Hub. It’s a private place in the cloud where we can store the Docker containers that we build. Once we push a container to ECR, it becomes available to any Docker-based compute service inside of AWS.

Kubernetes

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

Why do we use it: as you scale, managing where to run individual containers becomes unwieldy. Kubernetes does this exceptionally well. It does a lot of other things well, but in our case container orchestration is the overarching reason we use Kubernetes.

Helm

Helm helps you manage Kubernetes applications — Helm Charts help you define, install, and upgrade even the most complex Kubernetes application. Charts are easy to create, version, share, and publish.

Why we use it: Helm is the Kubernetes equivalent of the docker-compose.yml file in this repo for deploying the application and its backing services locally on your dev machine. It serves exactly the same purpose, but also provides useful version management features as well.

GitHub Actions

GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that allows you to automate your build, test, and deployment pipeline. You can create workflows that build and test every pull request to your repository, or deploy merged pull requests to production.

Why we use it: first, automating build and deployment workflows saves us tons of time and affords us more consistency in terms of the final results. But amongst competing CI/CD platform options GitHub Actions stands apart in terms of the seamless integration to GitHub repos as well as its quite-substantial 3rd party component support for other technology ecosystems like Docker, AWS and Kubernetes. Our GitHub Actions workflows are remarkably resilient, easy to read, and required minimal maintenance.

Dependabot & Mergify

Dependabot is a feature of GitHub whose main purpose is to assist developers in staying on top of their dependency ecosystem. It does this by automating the dependency update process which in turn proactively addresses any potential security concerns.

Mergify is an automation service that helps to streamline the process of merging pull requests on GitHub. It allows you to define rules for merging pull requests, which can help to automate your workflow. For example, you can set up Mergify to automatically merge pull requests when they pass all status checks, or when they’ve been approved by a certain number of reviewers. This can help to save time and ensure that your project’s main branch is always up to date with the latest changes.

Why we use them: Some of the 3rd party packages on which we rely — openai, langchain and React all come to mind — are frequently updated. Its not uncommon for dozens of Smarter’s 3rd party requirements to be updated over the course of one week. When combined with our automated build-deploy workflow for our test environment, we’re able to automatically update these requirements, re-test everything, and if all go well then propagate these changes to all branches in the repo. This keeps our codebase in tip-top condition while saving us tons of time.