Skip to content

Commit ceeaa91

Browse files
Complete AI Engineer Roadmap (#7508)
* ai eng content * 57 topics * 44 topics * 68 topics, need to add links to the final 15 or so * final topics * update copy and links * Update [email protected] Co-authored-by: Kamran Ahmed <[email protected]> * Update [email protected] Co-authored-by: Kamran Ahmed <[email protected]> * Update [email protected] Co-authored-by: Kamran Ahmed <[email protected]> * Update introduction@_hYN0gEi9BL24nptEtXWU.md Co-authored-by: Kamran Ahmed <[email protected]> * Update [email protected] Co-authored-by: Kamran Ahmed <[email protected]> * resolve comments * Update src/data/roadmaps/ai-engineer/content/[email protected] * Update src/data/roadmaps/ai-engineer/content/[email protected] * Update src/data/roadmaps/ai-engineer/content/chunking@mX987wiZF7p3V_gExrPeX.md * Update src/data/roadmaps/ai-engineer/content/[email protected] * Update src/data/roadmaps/ai-engineer/content/[email protected] * Update src/data/roadmaps/ai-engineer/content/manual-implementation@6xaRB34_g0HGt-y1dGYXR.md * Update src/data/roadmaps/ai-engineer/content/[email protected] * Update src/data/roadmaps/ai-engineer/content/[email protected] * Update src/data/roadmaps/ai-engineer/content/[email protected] * Update src/data/roadmaps/ai-engineer/content/popular-open-source-models@97eu-XxYUH9pYbD_KjAtA.md --------- Co-authored-by: Kamran Ahmed <[email protected]>
1 parent ee3736b commit ceeaa91

File tree

106 files changed

+757
-125
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

106 files changed

+757
-125
lines changed

src/data/roadmaps/ai-engineer/content/agents-usecases@778HsQzTuJ_3c9OSn5DmH.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,6 @@ AI Agents have a variety of usecases ranging from customer support, workflow aut
44

55
Visit the following resources to learn more:
66

7-
- [@article@Top 15 Use Cases Of AI Agents In Business](https://www.ampcome.com/post/15-use-cases-of-ai-agents-in-business)
7+
-[@article@Top 15 Use Cases Of AI Agents In Business](https://www.ampcome.com/post/15-use-cases-of-ai-agents-in-business)
88
-[@article@A Brief Guide on AI Agents: Benefits and Use Cases](https://www.codica.com/blog/brief-guide-on-ai-agents/)
99
-[@video@The Complete Guide to Building AI Agents for Beginners](https://youtu.be/MOyl58VF2ak?si=-QjRD_5y3iViprJX)
Original file line numberDiff line numberDiff line change
@@ -1 +1,9 @@
1-
# AI Agents
1+
# AI Agents
2+
3+
In AI engineering, "agents" refer to autonomous systems or components that can perceive their environment, make decisions, and take actions to achieve specific goals. Agents often interact with external systems, users, or other agents to carry out complex tasks. They can vary in complexity, from simple rule-based bots to sophisticated AI-powered agents that leverage machine learning models, natural language processing, and reinforcement learning.
4+
5+
Visit the following resources to learn more:
6+
7+
-[@article@Building an AI Agent Tutorial - LangChain](https://python.langchain.com/docs/tutorials/agents/)
8+
-[@article@Ai agents and their types](https://play.ht/blog/ai-agents-use-cases/)
9+
-[@video@The Complete Guide to Building AI Agents for Beginners](https://youtu.be/MOyl58VF2ak?si=-QjRD_5y3iViprJX)
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# AI Engineer vs ML Engineer
22

3-
An AI Engineer develops broad AI solutions, such as chatbots, NLP, and intelligent automation, focusing on integrating AI technologies into large applications. In contrast, an ML Engineer is more focused on building and deploying machine learning models, handling data processing, model training, and optimization in production environments.
3+
An AI Engineer uses pre-trained models and existing AI tools to improve user experiences. They focus on applying AI in practical ways, without building models from scratch. This is different from AI Researchers and ML Engineers, who focus more on creating new models or developing AI theory.
44

5-
Visit the following resources to learn more:
5+
Learn more from the following resources:
66

7-
- [@article@AI Engineer vs. ML Engineer: Duties, Skills, and Qualifications](https://www.upwork.com/resources/ai-engineer-vs-ml-engineer)
8-
- [@video@AI Developer vs ML Engineer: What’s the difference?](https://www.youtube.com/watch?v=yU87V2-XisA&t=2s)
7+
- [@article@What does an AI Engineer do?](https://www.codecademy.com/resources/blog/what-does-an-ai-engineer-do/)
8+
- [@article@What is an ML Engineer?](https://www.coursera.org/articles/what-is-machine-learning-engineer)
9+
- [@video@AI vs ML](https://www.youtube.com/watch?v=4RixMPF4xis)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# AI Safety and Ethics
1+
# AI Safety and Ethics
2+
3+
AI safety and ethics involve establishing guidelines and best practices to ensure that artificial intelligence systems are developed, deployed, and used in a manner that prioritizes human well-being, fairness, and transparency. This includes addressing risks such as bias, privacy violations, unintended consequences, and ensuring that AI operates reliably and predictably, even in complex environments. Ethical considerations focus on promoting accountability, avoiding discrimination, and aligning AI systems with human values and societal norms. Frameworks like explainability, human-in-the-loop design, and robust monitoring are often used to build systems that not only achieve technical objectives but also uphold ethical standards and mitigate potential harms.
4+
5+
Learn more from the following resources:
6+
7+
- [@video@What is AI Ethics?](https://www.youtube.com/watch?v=aGwYtUzMQUk)
8+
- [@article@Understanding artificial intelligence ethics and safety](https://www.turing.ac.uk/news/publications/understanding-artificial-intelligence-ethics-and-safety)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# AI vs AGI
1+
# AI vs AGI
2+
3+
AI (Artificial Intelligence) refers to systems designed to perform specific tasks by mimicking aspects of human intelligence, such as pattern recognition, decision-making, and language processing. These systems, known as "narrow AI," are highly specialized, excelling in defined areas like image classification or recommendation algorithms but lacking broader cognitive abilities. In contrast, AGI (Artificial General Intelligence) represents a theoretical form of intelligence that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a human-like level. AGI would have the capacity for abstract thinking, reasoning, and adaptability similar to human cognitive abilities, making it far more versatile than today’s AI systems. While current AI technology is powerful, AGI remains a distant goal and presents complex challenges in safety, ethics, and technical feasibility.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@What is AGI?](https://aws.amazon.com/what-is/artificial-general-intelligence/)
8+
- [@article@The crucial difference between AI and AGI](https://www.forbes.com/sites/bernardmarr/2024/05/20/the-crucial-difference-between-ai-and-agi/)
Original file line numberDiff line numberDiff line change
@@ -1 +1,7 @@
1-
# Anomaly Detection
1+
# Anomaly Detection
2+
3+
Anomaly detection with embeddings works by transforming data, such as text, images, or time-series data, into vector representations that capture their patterns and relationships. In this high-dimensional space, similar data points are positioned close together, while anomalies stand out as those that deviate significantly from the typical distribution. This approach is highly effective for detecting outliers in tasks like fraud detection, network security, and quality control.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@Anomoly in Embeddings](https://ai.google.dev/gemini-api/tutorials/anomaly_detection)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Anthropic's Claude
1+
# Anthropic's Claude
2+
3+
Anthropic's Claude is an AI language model designed to facilitate safe and scalable AI systems. Named after Claude Shannon, the father of information theory, Claude focuses on responsible AI use, emphasizing safety, alignment with human intentions, and minimizing harmful outputs. Built as a competitor to models like OpenAI's GPT, Claude is designed to handle natural language tasks such as generating text, answering questions, and supporting conversations, with a strong focus on aligning AI behavior with user goals while maintaining transparency and avoiding harmful biases.
4+
5+
Learn more from the following resources:
6+
7+
- [@official@Claude Website](https://claude.ai)
8+
- [@video@How To Use Claude Pro For Beginners](https://www.youtube.com/watch?v=J3X_JWQkvo8)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Audio Processing
1+
# Audio Processing
2+
3+
Audio processing in multimodal AI enables a wide range of use cases by combining sound with other data types, such as text, images, or video, to create more context-aware systems. Use cases include speech recognition paired with real-time transcription and visual analysis in meetings or video conferencing tools, voice-controlled virtual assistants that can interpret commands in conjunction with on-screen visuals, and multimedia content analysis where audio and visual elements are analyzed together for tasks like content moderation or video indexing.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@The State of Audio Processing](https://appwrite.io/blog/post/state-of-audio-processing)
8+
- [@video@Audio Signal Processing for Machine Learning](https://www.youtube.com/watch?v=iCwMQJnKk2c)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# AWS Sagemaker
1+
# AWS SageMaker
2+
3+
AWS SageMaker is a fully managed machine learning service from Amazon Web Services that enables developers and data scientists to build, train, and deploy machine learning models at scale. It provides an integrated development environment, simplifying the entire ML workflow, from data preparation and model development to training, tuning, and inference. SageMaker supports popular ML frameworks like TensorFlow, PyTorch, and Scikit-learn, and offers features like automated model tuning, model monitoring, and one-click deployment. It's designed to make machine learning more accessible and scalable, even for large enterprise applications.
4+
5+
Learn more from the following resources:
6+
7+
- [@official@AWS SageMaker](https://aws.amazon.com/sagemaker/)
8+
- [@video@Introduction to Amazon SageMaker](https://www.youtube.com/watch?v=Qv_Tr_BCFCQ)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Azure AI
1+
# Azure AI
2+
3+
Azure AI is a suite of AI services and tools provided by Microsoft through its Azure cloud platform. It includes pre-built AI models for natural language processing, computer vision, and speech, as well as tools for developing custom machine learning models using services like Azure Machine Learning. Azure AI enables developers to integrate AI capabilities into applications with APIs for tasks like sentiment analysis, image recognition, and language translation. It also supports responsible AI development with features for model monitoring, explainability, and fairness, aiming to make AI accessible, scalable, and secure across industries.
4+
5+
Learn more from the following resources:
6+
7+
- [@official@Azure AI](https://azure.microsoft.com/en-gb/solutions/ai)
8+
- [@video@How to Choose the Right Models for Your Apps](https://www.youtube.com/watch?v=sx_uGylH8eg)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Benefits of Pre-trained Models
1+
# Benefits of Pre-trained Models
2+
3+
Pre-trained models offer several benefits in AI engineering by significantly reducing development time and computational resources because these models are trained on large datasets and can be fine-tuned for specific tasks, which enables quicker deployment and better performance with less data. They help overcome the challenge of needing vast amounts of labeled data and computational power for training from scratch. Additionally, pre-trained models often demonstrate improved accuracy, generalization, and robustness across different tasks, making them ideal for applications in natural language processing, computer vision, and other AI domains.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@Why Pre-Trained Models Matter For Machine Learning](https://www.ahead.com/resources/why-pre-trained-models-matter-for-machine-learning/)
8+
- [@article@Why You Should Use Pre-Trained Models Versus Building Your Own](https://cohere.com/blog/pre-trained-vs-in-house-nlp-models)
Original file line numberDiff line numberDiff line change
@@ -1 +1,9 @@
1-
# Bias and Fareness
1+
# Bias and Faireness
2+
3+
Bias and fairness in AI refer to the challenges of ensuring that machine learning models do not produce discriminatory or skewed outcomes. Bias can arise from imbalanced training data, flawed assumptions, or biased algorithms, leading to unfair treatment of certain groups based on race, gender, or other factors. Fairness aims to address these issues by developing techniques to detect, mitigate, and prevent biases in AI systems. Ensuring fairness involves improving data diversity, applying fairness constraints during model training, and continuously monitoring models in production to avoid unintended consequences, promoting ethical and equitable AI use.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@What Do We Do About the Biases in AI?](https://hbr.org/2019/10/what-do-we-do-about-the-biases-in-ai)
8+
- [@article@AI Bias - What Is It and How to Avoid It?](https://levity.ai/blog/ai-bias-how-to-avoid)
9+
- [@article@What about fairness, bias and discrimination?](https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/how-do-we-ensure-fairness-in-ai/what-about-fairness-bias-and-discrimination/)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Capabilities / Context Length
1+
# Capabilities / Context Length
2+
3+
A key aspect of the OpenAI models is their context length, which refers to the amount of input text the model can process at once. Earlier models like GPT-3 had a context length of up to 4,096 tokens (words or word pieces), while more recent models like GPT-4 can handle significantly larger context lengths, some supporting up to 32,768 tokens. This extended context length enables the models to handle more complex tasks, such as maintaining long conversations or processing lengthy documents, which enhances their utility in real-world applications like legal document analysis or code generation.
4+
5+
Learn more from the following resources:
6+
7+
- [@official@Managing Context](https://platform.openai.com/docs/guides/text-generation/managing-context-for-text-generation)
8+
- [@official@Capabilities](https://platform.openai.com/docs/guides/text-generation)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Chat Completions API
1+
# Chat Completions API
2+
3+
The OpenAI Chat Completions API is a powerful interface that allows developers to integrate conversational AI into applications by utilizing models like GPT-3.5 and GPT-4. It is designed to manage multi-turn conversations, keeping context across interactions, making it ideal for chatbots, virtual assistants, and interactive AI systems. With the API, users can structure conversations by providing messages in a specific format, where each message has a role (e.g., "system" to guide the model, "user" for input, and "assistant" for responses).
4+
5+
Learn more from the following resources:
6+
7+
- [@official@Create Chat Completions](https://platform.openai.com/docs/api-reference/chat/create)
8+
- [@article@](https://medium.com/the-ai-archives/getting-started-with-openais-chat-completions-api-in-2024-462aae00bf0a)
Original file line numberDiff line numberDiff line change
@@ -1 +1,9 @@
1-
# Chunking
1+
# Chunking
2+
3+
The chunking step in Retrieval-Augmented Generation (RAG) involves breaking down large documents or data sources into smaller, manageable chunks. This is done to ensure that the retriever can efficiently search through large volumes of data while staying within the token or input limits of the model. Each chunk, typically a paragraph or section, is converted into an embedding, and these embeddings are stored in a vector database. When a query is made, the retriever searches for the most relevant chunks rather than the entire document, enabling faster and more accurate retrieval.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@Understanding LangChain's RecursiveCharacterTextSplitter](https://dev.to/eteimz/understanding-langchains-recursivecharactertextsplitter-2846)
8+
- [@article@Chunking Strategies for LLM Applications](https://www.pinecone.io/learn/chunking-strategies/)
9+
- [@article@A Guide to Chunking Strategies for Retrieval Augmented Generation](https://zilliz.com/learn/guide-to-chunking-strategies-for-rag)
Original file line numberDiff line numberDiff line change
@@ -1 +1,10 @@
1-
# Code Completion Tools
1+
# Code Completion Tools
2+
3+
Code completion tools are AI-powered development assistants designed to enhance productivity by automatically suggesting code snippets, functions, and entire blocks of code as developers type. These tools, such as GitHub Copilot and Tabnine, leverage machine learning models trained on vast code repositories to predict and generate contextually relevant code. They help reduce repetitive coding tasks, minimize errors, and accelerate the development process by offering real-time, intelligent suggestions.
4+
5+
Learn more from the following resources:
6+
7+
- [@official@GitHub Copilot](https://github.com/features/copilot)
8+
- [@official@Codeium](https://codeium.com/)
9+
- [@official@Supermaven](https://supermaven.com/)
10+
- [@official@Tabnine](https://www.tabnine.com/)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Cohere
1+
# Cohere
2+
3+
Cohere is an AI platform that specializes in natural language processing (NLP) by providing large language models designed to help developers build and deploy text-based applications. Cohere’s models are used for tasks such as text classification, language generation, semantic search, and sentiment analysis. Unlike some other providers, Cohere emphasizes simplicity and scalability, offering an easy-to-use API that allows developers to fine-tune models on custom data for specific use cases. Additionally, Cohere provides robust multilingual support and focuses on ensuring that its NLP solutions are both accessible and enterprise-ready, catering to a wide range of industries.
4+
5+
Learn more from the following resources:
6+
7+
- [@official@Cohere Website](https://cohere.com/)
8+
- [@article@What Does Cohere Do?](https://medium.com/geekculture/what-does-cohere-do-cdadf6d70435)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Conducting adversarial testing
1+
# Conducting adversarial testing
2+
3+
Adversarial testing involves intentionally exposing machine learning models to deceptive, perturbed, or carefully crafted inputs to evaluate their robustness and identify vulnerabilities. The goal is to simulate potential attacks or edge cases where the model might fail, such as subtle manipulations in images, text, or data that cause the model to misclassify or produce incorrect outputs. This type of testing helps to improve model resilience, particularly in sensitive applications like cybersecurity, autonomous systems, and finance.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@Adversarial Testing for Generative AI](https://developers.google.com/machine-learning/resources/adv-testing)
8+
- [@article@Adversarial Testing: Definition, Examples and Resources](https://www.leapwork.com/blog/adversarial-testing)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Constraining outputs and inputs
1+
# Constraining outputs and inputs
2+
3+
Constraining outputs and inputs in AI models refers to implementing limits or rules that guide both the data the model processes (inputs) and the results it generates (outputs). Input constraints ensure that only valid, clean, and well-formed data enters the model, which helps to reduce errors and improve performance. This can include setting data type restrictions, value ranges, or specific formats. Output constraints, on the other hand, ensure that the model produces appropriate, safe, and relevant results, often by limiting output length, specifying answer formats, or applying filters to avoid harmful or biased responses. These constraints are crucial for improving model safety, alignment, and utility in practical applications.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@Preventing Prompt Injection](https://learnprompting.org/docs/prompt_hacking/defensive_measures/introduction)
8+
- [@article@Introducing Structured Outputs in the API - OpenAI](https://openai.com/index/introducing-structured-outputs-in-the-api/)
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
# Cut-off Dates / Knowledge
1+
# Cut-off Dates / Knowledge
2+
3+
OpenAI models, such as GPT-3.5 and GPT-4, have a knowledge cutoff date, which refers to the last point in time when the model was trained on data. For instance, as of the current version of GPT-4, the knowledge cutoff is October 2023. This means the model does not have awareness or knowledge of events, advancements, or data that occurred after that date. Consequently, the model may lack information on more recent developments, research, or real-time events unless explicitly updated in future versions. This limitation is important to consider when using the models for time-sensitive tasks or inquiries involving recent knowledge.
4+
5+
Learn more from the following resources:
6+
7+
- [@article@Knowledge Cutoff Dates of all LLMs explained](https://otterly.ai/blog/knowledge-cutoff/)
8+
- [@article@Knowledge Cutoff Dates For ChatGPT, Meta Ai, Copilot, Gemini, Claude](https://computercity.com/artificial-intelligence/knowledge-cutoff-dates-llms)

0 commit comments

Comments
 (0)