Technology

NLP & Large Language Models New

Language models, fine-tuning, prompt engineering, RAG

What is a large language model (LLM)?

A large language model is a neural network trained on massive text datasets to predict and generate human-like text. LLMs use transformer architectures with billions of parameters, enabling tasks like translation, summarization, and question answering without task-specific training. [Source: Google Research]

Sources

Attention Is All You Need

academic · arXiv (Google Brain / Google Research) · 2017-06-12

Large Language Models: A New Moore's Law?

primary · Google Research · 2021-10-04

How do transformer models work in NLP?

Transformers process entire input sequences simultaneously using a self-attention mechanism that weighs relationships between all tokens at once. This parallelism, introduced in the 2017 'Attention Is All You Need' paper, replaced recurrent networks and enabled scaling to billions of parameters efficiently. [Source: Google Brain / arXiv]

Sources

Attention Is All You Need

academic · arXiv (Google Brain / Google Research) · 2017-06-12

What is prompt engineering and why does it matter?

Prompt engineering is the practice of designing inputs to guide LLM outputs toward desired results. Because LLMs are sensitive to phrasing, few-shot examples, and instruction framing, well-crafted prompts can dramatically improve accuracy and relevance without modifying model weights. [Source: OpenAI]

Sources

Prompt Engineering – OpenAI Platform Documentation

official · OpenAI · 2024-01-01

What is fine-tuning an LLM and when should you use it?

Fine-tuning updates a pretrained LLM's weights on a smaller, domain-specific dataset to improve performance on targeted tasks. It is best used when consistent style, specialized vocabulary, or domain accuracy is required and cannot be achieved reliably through prompt engineering alone. [Source: OpenAI]

Sources

Fine-tuning – OpenAI Platform Documentation

official · OpenAI · 2024-01-01

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation combines a retrieval system — typically a vector database — with an LLM so the model can access up-to-date or proprietary documents at inference time. This reduces hallucinations and extends knowledge beyond the model's training cutoff without retraining. [Source: Meta AI / arXiv]

Sources

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

academic · arXiv (Meta AI Research) · 2020-05-22

What is Reinforcement Learning from Human Feedback (RLHF)?

RLHF is a training technique where human raters rank model outputs, and those preferences are used to train a reward model that then guides further LLM fine-tuning via reinforcement learning. OpenAI used RLHF to align InstructGPT and ChatGPT with human intent. [Source: OpenAI / arXiv]

Sources

Training language models to follow instructions with human feedback

academic · arXiv (OpenAI) · 2022-03-04

What is few-shot prompting in large language models?

Few-shot prompting provides an LLM with a small number of input-output examples directly in the prompt to demonstrate the desired task format. GPT-3's original paper showed that this in-context learning approach enables strong performance on new tasks without any gradient updates. [Source: OpenAI / arXiv]

Sources

Language Models are Few-Shot Learners

academic · arXiv (OpenAI) · 2020-05-28

What is chain-of-thought prompting?

Chain-of-thought (CoT) prompting encourages LLMs to produce intermediate reasoning steps before a final answer, mimicking human problem-solving. Google Research demonstrated that this technique substantially improves accuracy on arithmetic, commonsense, and symbolic reasoning benchmarks across large models. [Source: Google Research / arXiv]

Sources

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

academic · arXiv (Google Research) · 2022-01-28

What is BERT and how does it differ from GPT-style models?

BERT (Bidirectional Encoder Representations from Transformers) is a Google model pretrained to understand context from both directions simultaneously, making it strong at classification and extraction tasks. GPT-style models are unidirectional, predicting the next token, making them better suited for text generation. [Source: Google Research / arXiv]

Sources

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

academic · arXiv (Google AI Language) · 2018-10-11

What is LoRA and how does it make LLM fine-tuning more efficient?

Low-Rank Adaptation (LoRA) fine-tunes LLMs by injecting small trainable rank-decomposition matrices into model layers rather than updating all weights. This reduces trainable parameters by up to 10,000× and GPU memory requirements drastically, making fine-tuning accessible on consumer hardware. [Source: Microsoft Research / arXiv]

Sources

LoRA: Low-Rank Adaptation of Large Language Models

academic · arXiv (Microsoft Research) · 2021-06-17

What is QLoRA and how does quantization help LLM fine-tuning?

QLoRA combines 4-bit quantization with LoRA adapters, allowing a 65-billion-parameter model to be fine-tuned on a single 48GB GPU without significant accuracy loss. Developed at the University of Washington, it enables high-quality instruction tuning on widely available hardware. [Source: University of Washington / arXiv]

Sources

QLoRA: Efficient Finetuning of Quantized LLMs

academic · arXiv (University of Washington) · 2023-05-23

What is a vector database and how is it used in LLM applications?

A vector database stores high-dimensional numerical embeddings of text, images, or other data, enabling fast semantic similarity search. In LLM pipelines — especially RAG — vector databases retrieve contextually relevant documents to inject into prompts, grounding model responses in specific knowledge. [Source: IEEE]

Sources

Vector Database Management Systems: Fundamental Concepts, Use-cases, and Current Challenges

academic · IEEE · 2024-05-01

What are text embeddings in NLP?

Text embeddings are dense numerical vector representations of words, sentences, or documents that capture semantic meaning, so that similar meanings map to nearby points in vector space. They are foundational to search, classification, clustering, and retrieval tasks in modern NLP systems. [Source: Google Research / arXiv]

Sources

Efficient Estimation of Word Representations in Vector Space

academic · arXiv (Google Research) · 2013-01-16

Get text embeddings – Vertex AI Generative AI

official · Google Cloud · 2024-03-01

How can you reduce hallucinations in large language models?

Key strategies include grounding responses via Retrieval-Augmented Generation, providing explicit source documents in context, using temperature settings closer to zero, applying output validation layers, and fine-tuning on high-quality factual data. No single technique eliminates hallucinations entirely; combinations work best. [Source: Stanford HAI]

Sources

The Hallucination Crisis

academic · Stanford Human-Centered AI Institute (HAI) · 2023-08-07

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

academic · arXiv (Meta AI Research) · 2020-05-22

What is Constitutional AI (CAI)?

Constitutional AI is Anthropic's alignment method where an LLM critiques and revises its own outputs according to a set of written principles (a 'constitution'), reducing the need for human labelers to evaluate harmful content directly. It improves harmlessness while maintaining helpfulness. [Source: Anthropic / arXiv]

Sources

Constitutional AI: Harmlessness from AI Feedback

academic · arXiv (Anthropic) · 2022-12-15

What are the main safety risks associated with large language models?

Primary LLM risks include generating harmful or false content (hallucinations), leaking private training data, enabling cyberattacks via code generation, producing biased outputs, and being misused for disinformation. NIST's AI Risk Management Framework provides a structured approach to identifying and mitigating these risks. [Source: NIST]

Sources

AI Risk Management Framework (AI RMF 1.0)

primary · National Institute of Standards and Technology (NIST) · 2023-01-26

What is a context window in an LLM, and why does size matter?

A context window is the maximum number of tokens an LLM can process in a single inference pass, covering both input and output. Larger windows — now reaching 1 million tokens in some models — allow processing of entire codebases or books, enabling richer reasoning and document-level tasks. [Source: Google DeepMind]

Sources

Gemini: A Family of Highly Capable Multimodal Models

academic · arXiv (Google DeepMind) · 2023-12-19

What is tokenization in NLP and how does it affect LLM performance?

Tokenization splits raw text into subword units (tokens) using algorithms like Byte-Pair Encoding (BPE) or WordPiece, forming the model's vocabulary. Token count directly impacts cost, context length usage, and model behavior — for example, rare words or non-English text typically consume more tokens per character. [Source: OpenAI]

Sources

Tokenizer – OpenAI Platform

official · OpenAI · 2024-01-01

Neural Machine Translation of Rare Words with Subword Units

academic · arXiv (University of Edinburgh) · 2015-08-31

What is the relationship between LLM parameter count and model capability?

Parameter count measures the total number of learnable weights in a model; larger counts generally yield better performance, but with diminishing returns. DeepMind's Chinchilla research showed that many large models were undertrained — optimal performance requires scaling data proportionally with parameters, not just increasing size. [Source: DeepMind / arXiv]

Sources

Training Compute-Optimal Large Language Models

academic · arXiv (DeepMind) · 2022-03-29

When should you use RAG versus fine-tuning for an LLM application?

RAG is preferred when information changes frequently, requires citing sources, or is too large to train on — it injects knowledge at runtime. Fine-tuning is better for teaching consistent tone, style, or domain-specific behavior that is stable over time. Many production systems combine both approaches. [Source: Stanford HAI]

Sources

Reflections on Foundation Models and Generative AI

academic · Stanford Human-Centered AI Institute (HAI) · 2023-03-01

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

academic · arXiv (Meta AI Research) · 2020-05-22

Fine-tuning – OpenAI Platform Documentation

official · OpenAI · 2024-01-01

NLP & Large Language Models New

What is a large language model (LLM)?

How do transformer models work in NLP?

What is prompt engineering and why does it matter?

What is fine-tuning an LLM and when should you use it?

What is Retrieval-Augmented Generation (RAG)?

What is Reinforcement Learning from Human Feedback (RLHF)?

What is few-shot prompting in large language models?

What is chain-of-thought prompting?

What is BERT and how does it differ from GPT-style models?

What is LoRA and how does it make LLM fine-tuning more efficient?

What is QLoRA and how does quantization help LLM fine-tuning?

What is a vector database and how is it used in LLM applications?

What are text embeddings in NLP?

How can you reduce hallucinations in large language models?

What is Constitutional AI (CAI)?

What are the main safety risks associated with large language models?

What is a context window in an LLM, and why does size matter?

What is tokenization in NLP and how does it affect LLM performance?

What is the relationship between LLM parameter count and model capability?

When should you use RAG versus fine-tuning for an LLM application?

Sign in

Consent & Cookies