7 LLM Generation Parameters

...explained visually

Sep 03, 2025

🤖 Give Your AI Agents a Digital Passport

The internet can't tell the difference between trusted AI agents and malicious bots, until now.

Browserbase is partnering with Cloudflare to pioneer Web Bot Auth, the first secure identity system for AI agents. Think of it as a verified passport that lets your agents browse the web with cryptographic proof of their legitimacy.

Whether you're automating compliance or giving agents access to the same tools your team uses, proper identity is the key to unlocking complex AI workflows.

Apply for early access here →

Apply for early access

Browserbase: The leading browser infrastructure for AI with good intent.

Thanks to Browserbase for partnering today!

7 LLM Generation Parameters

Every generation from an LLM is shaped by parameters under the hood.

Knowing how to tune is important so that you can produce sharp and more controlled outputs.

Here are the 7 levers that matter most:

Max tokens
- This is a hard cap on how many tokens the model can generate in one response.
- Too low → truncated outputs; too high → could lead to wasted compute.
Temperature (covered in detail here):
- Governs randomness. Low temperature (~0) makes the model deterministic.
- Higher temperature (0.7–1.0) boosts creativity, diversity, but also noise.
- Use case: lower for QA/chatbots, higher for brainstorming/creative tasks.
Top-k:
- The default way to generate the next token is to sample from all tokens, proportional to their probability.
- This parameter restricts sampling to the top k most probable tokens.
- Example: k=5 → model only considers 5 most likely next tokens during sampling.
- Helps enforce focus, but overly small k may give repetitive outputs.
Top-p (nucleus sampling):
- Instead of picking from all tokens or top k tokens, model samples from a probability mass up to p.
- Example: top_p=0.9 → only the smallest set of tokens covering 90% probability are considered.
- More adaptive than top_k, useful when balancing coherence with diversity.
Frequency penalty:
- Reduces likelihood of reusing tokens that have already appeared frequently.
- Positive values discourage repetition, negative values exaggerate it.
- Useful for summarization (avoid redundancy) or poetry (intentional repetition).
Presence penalty
- Encourages the model to bring in new tokens not yet seen in the text.
- Higher values push for novelty, lower values make the model stick to known patterns.
- Handy for exploratory generation where diversity of ideas is valued.
Stop sequences
- Custom list of tokens that immediately halt generation.
- Critical in structured outputs (e.g., JSON), preventing spillover text.
- Let’s you enforce strict response boundaries without heavy prompt engineering.

👉 Over to you: What other LLM generation params have we missed?

Thanks for reading!

P.S. For those wanting to develop “Industry ML” expertise:

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) that align with such topics.

Develop "Industry ML" Skills

Here are some of them:

Learn everything about MCPs in this crash course with 9 parts →
Learn how to build Agentic systems in a crash course with 14 parts.
Learn how to build real-world RAG apps and evaluate and scale them in this crash course.

Learn sophisticated graph architectures and how to train them on graph data.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Learn how to run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn techniques to reliably test new models in production.
Learn how to build privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Daily Dose of Data Science

Discussion about this post

Ready for more?