Deployment & MLOps
FastSearch is deployed on a serverless architecture in order to minimize hosting costs while enabling horizontal scaling. The frontend is served from an AWS S3 bucket and uses AWS CloudFront as a CDN in order to cache static files, terminate SSL handshakes closer to the user and reverse proxy the backend API. The backend is hosted as a containerized AWS App Runner endpoint.
CI/CD pipeline
FastSearch uses AWS CDK as its infrastructure-as-code framework for declarative configuration and deployment. This enables FastSearch to version infrastructure alongside source code and leverage prebuilt best practices such as CDN cache invalidation on frontend deployments. Github Actions are used to build the frontend static files and backend docker container, and also deploy any changes to the infrastructure when a feature branch is merged into main on the FastSearch GitHub repo.
Model Redeployment
FastSearch leverages Hugging Face Model Hub webhooks and Github Actions to rebuild and deploy the backend Docker image whenever the cross-encoder or bi-encoder model weights are updated. Whenever the bi-encoder model is updated, an additional batch of inference jobs in the FastSearch data pipeline is triggered in order to rebuild the ANN index of all lecture transcripts.