Artificial Intelligence

The AI Engineer Cheat Sheet

All the tools you need to build a great AI-powered application

Emily Choi-Greene

June 9, 2025

min read

AI Engineering Lifecycle: circular diagram points from model selection to prompt engineering to evaluation to integrations and then back to model selection

Share this Article

At Clearly AI, we’re firm believers that “buy” is not always the answer for every organization. We also believe that everyone in the technology space should learn how to build AI applications and use AI to solve problems in their enterprise. We’ve put together a high-level cheat sheet to get you started, and we’re always happy to consult with you on the best ways to simplify your AI workflows.

Understanding the Icons

As security & privacy engineers, we’ve added notations for the best tooling from a privacy-preserving lens (e.g., companies that don’t train on your data), a security lens (e.g., companies with self-hosting options), and a compliance lens (e.g., companies with SOC2 certifications). We’ve also added notations for open source software and tooling with a generous free tier. The notation may not be complete (may be missing icons), but we did our best!

💭= Privacy-conscious (Zero-Day Retention option, Do Not Train option)
🔒= Security-conscious (Self-hosted option for non-open source tooling)
🥈= SOC2 compliant
📖= Open source
🆓= Very generous free tier

Step 0: Define the problem

What are you trying to achieve? What problem are you trying to solve? How are you planning to leverage AI?

This part is the most important, and the hardest area for us to give generic advice. If you’re looking for specific pointers on which security, privacy, and compliance problems are best addressed with AI, set up some time to chat.

Step 1: Model Selection

Pick the right model type for the task

*The above image was intended for model selection for coding tasks, but the logic can generally be applied to most tasks.*

Multi-model Support

As indicated above, you probably want to have your pick from a variety of model types, depending on your task.

Some options:

Fireworks 💭🥈 (Open source, Open AI, Google, and Microsoft models)‍
Parasail (Open source only)‍
AWS Bedrock 💭🔒🥈 (Open source, Anthropic, Cohere, and Amazon models)‍
Azure AI Foundry 💭🔒🥈(Open source, Open AI, xAI, and Microsoft models)

Major Labs

These are generally well-known and self-explanatory. Open AI and Anthropic both offer Zero Day Retention. Of note - there’s a major difference in using AI providers via API and via their Chat products like ChatGPT and Claude. The Terms of Use and System Prompts differ widely.

Some options:

Open AI 💭🥈 - note that you need to request Zero Day Retention from the team‍
Anthropic 💭🥈- note that you need to request Zero Day Retention from the team‍
Cohere 🔒🥈 ‍
Google (Gemini) - note that we were unable to find security & privacy information outside of Gemini for Google Workspace

Fine Tuning

Unsloth 📖 - the fastest way to train and fine-tune open source models

Step 2: Prompt Engineering

Prompt Frameworks

OOTB frameworks if you don’t want to interface with the LLMs directly.

Some options:

LangChain 📖- standard interface for LLMs‍
LlamaIndex 📖 - specializes in vector stores / RAG ‍
Marvin 📖- supports structured outputs, memory, and more

Structured Outputs

Structured outputs are hugely important for reliable responses for programmatic consumption.

Some options:

Instructor 📖- the most popular structured outputs library‍
BAML 📖- a parser that aligns outputs to schemas for structured outputs. BAML also makes it extremely easy to switch between models and model providers (to prevent dependency on any one API).
OpenAI / Anthropic / Gemini APIs - each provide their own native structured outputs, though they are not necessarily as effective as solutions like BAML

Memory

Some options:

mem0 📖- memory layer that allows for search across previous interactions‍
TypeAgent 📖- new memory capabilities out of Microsoft‍
Zep 🔒🥈📖- supports continuous learning

Step 3: Evaluations

Evaluations are critical for determining whether your AI application is performing as expected. There’s a lot of evaluation frameworks out there, and a lot of investment in evaluation across AI companies.

Some options:

Braintrust 💭🔒🥈🆓 - used by industry leaders like Notion, note that their SDK is open source‍
Freeplay 💭🔒🥈‍
Arize Phoenix 🥈📖‍
Langfuse 🥈📖- more of an observability play than pure evaluations‍
Langsmith - owned by Langchain, for debugging & testing‍
Logfire 🔒🥈- also known as Pydantic‍
Replicate ‍
Tensorfuse 💭🔒🥈🆓- really created to be self-hosted, and industry leading‍
Coval🥈- evaluations for voice agents

Feedback

DSPy 📖- prompt optimization

Step 4: Robust Integrations

A good AI application doesn’t exist in a silo. You probably want to enable interactions with webpages or other applications.

Tool Use

Some LLM providers support tool use out of the box today, such as Open AI’s Web Search. In addition to out of the box Tool Use capabilities, you can also register "tools" with your LLM in the system prompt by providing API calls that the LLM could choose to call.

Agent <> Web Page Interactions

Some options:

Tambo 📖- add AI-generated React components to your UI‍
AG-UI 📖- how agents can communicate with front-end frameworks‍
NLWeb 📖- way for agents to interact with websites

Integration Protocols

MCP - generally seems to be the favorite industry standard, developed by Anthropic‍
A2A - developed by Google

Goal: Create a Flywheel

Ways to create a positive feedback loop, make yourself more productive, and continue to accelerate your product.

Code editors

Some options:

Cursor 💭🥈- the industry leader‍
Windsurf - recently acquired by OpenAI‍
Void 📖- OSS version of Cursor

UI Generation

Use AI to create dynamic user interfaces.

Some options:

v0 - made by Vercel, the company behind NextJS‍
bolt.new - a viral favorite‍
Polymet - allows you to comment on specific areas you’d like updated, just like you would with a Figma file and your designer