LLMs do not always listen to instructions properly. Structured outputs for LLMs are a feature ensuring model responses follow a strict, user-defined format (like JSON or XML schema) instead of free-form text, making outputs predictable, machine-readable, and easily integrable into applications.
Natively Supported Structured Outputs
A number of LLM services (e.g., vLLM, OpenAI, Anthropic Claude, AWS GovCloud Bedrock) include native support for producing structured outputs. To take advantage of this capability when it exists, you can supply a Pydantic model representing the desired output format to the response_format parameter ofLLM.prompt.
Structured outputs for LLMs are a feature ensuring model responses follow a strict, user-defined format (like JSON or XML schema) instead of free-form text, making outputs predictable, machine-readable, and easily integrable into applications.
Anthropic or OpenAI
from onprem import LLMfrom pydantic import BaseModelclass ContactInfo(BaseModel): name: str email: str plan_interest: str demo_requested: bool# Create LLM instance for Claudellm = LLM("anthropic/claude-3-7-sonnet-latest")# Use structured output - this should automatically use Claude's native APIresult = llm.prompt("Extract info from: John Smith (john@example.com) is interested in our Enterprise plan and wants to schedule a demo for next Tuesday at 2pm.", response_format=ContactInfo )print(f"Name: {result.name}")print(f"Email: {result.email}")print(f"Plan: {result.plan_interest}")print(f"Demo: {result.demo_requested}")
The above approach using the response_format parameter works with both Anthropic and OpenAI as LLM backends.
AWS GovCloud Bedrock
A structured output example using AWS GovCloud Bedrock is shown here.
VLLM
For vLLM, you can generate structured outputs using documented extra parameters like extra_body argument as illustrated below:
from onprem import LLMllm = LLM(model_url='http://localhost:8666/v1', api_key='test123', model='MyGPT')# classification-based structured outputsresult = llm.prompt('Classify this sentiment: vLLM is wonderful!', extra_body={"structured_outputs": {"choice": ["positive", "negative"]}})# OUTPUT: positive# JSON-based structured outputsfrom pydantic import BaseModel, Fieldclass MeasuredQuantity(BaseModel): value: str= Field(description="numerical value - number only") unit: str= Field(description="unit of measurement")response_format = {"type": "json_schema","json_schema": {"name": MeasuredQuantity.__name__.lower(),"schema": MeasuredQuantity.model_json_schema()}}result = llm.prompt('Extract unit and value from the following: He was going 35 mph.', response_format=response_format)# OUTPUT: { "value": "35", "unit": "mph" }# RegEx-based strucured outputsresult = llm.prompt("Generate an example email address for Alan Turing, who works in Enigma. End in "".com and new line.", extra_body={"structured_outputs": {"regex": r"\w+@\w+\.com\n"}, "stop": ["\n"]},)# OUTPUT: Alan_Turing@enigma.com
Ollama
from pydantic import BaseModelclass Pet(BaseModel): name: str animal: str age: int color: str|None favorite_toy: str|Noneclass PetList(BaseModel): pets: list[Pet]llm = LLM('ollama/llama3.1')result = llm.prompt('I have two cats named Luna and Loki...', format=PetList.model_json_schema())
When using an LLM backend that does not natively support structured outputs, supplying a Pydantic model via the response_format parameter to LLM.prompt should result in an automatic fall back to a prompt-based approach to structured outputs as described next.
Tip: When using natively-supported structured outputs, it is important to include an actual instruction in the prompt (e.g., “Classify this sentiment”, “Extract info from”, etc.). With prompt-based structured outputs (described below), the instruction can often be omitted.
Prompt-Based Structured Outputs
The LLM.pydantic_prompt method also allows you to specify the desired structure of the LLM’s output as a Pydantic model. Internally, LLM.pydantic_prompt wraps the user-supplied prompt within a larger prompt telling the LLM to output results in a specific JSON format. It is sometimes less efficient/reliable than aforementioned native methods, but is more generally applicable to any LLM. Since calling LLM.prompt with the response_format parameter will automatically invoke LLM.pydantic_prompt when necessary, you will typically not have to call LLM.pydantic_prompt directly.
from pydantic import BaseModel, Fieldclass Joke(BaseModel): setup: str= Field(description="question to set up a joke") punchline: str= Field(description="answer to resolve the joke")from onprem import LLMllm = LLM(default_model='llama', verbose=False)structured_output = llm.pydantic_prompt('Tell me a joke.', pydantic_model=Joke)
llama_context: n_ctx_per_seq (3900) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_unified: LLAMA_SET_ROWS=0, using old ggml_cpy() method for backwards compatibility
{
"setup": "Why couldn't the bicycle stand up by itself?",
"punchline": "Because it was two-tired!"
}
structured_output
Joke(setup="Why couldn't the bicycle stand up by itself?", punchline='Because it was two-tired!')
Why couldn't the bicycle stand up by itself?
Because it was two-tired!
Guider
The Guider in OnPrem.LLM, a simple interface to the Guidance package, can be used to guide the output of an LLM based on conditions and constraints that you supply.
Let’s begin by creating an onprem.LLM instance.
from onprem import LLMllm = LLM(n_gpu_layers=-1, verbose=False) # set based on your system
from onprem.pipelines.guider import Guiderguider = Guider(llm)
The guider.prompt method accepts Guidance prompts as input. (You can refer to the Guidance documentation for information on how to construct such prompts.)
Here, we’ll show some examples (mostly taken from the Guidance documentation) and begin with importing some Guidance functions.
The select function
The select function allows you to guide the LLM to generate output from only a finite set of alternatives. The Guider.prompt method returns a dictionary with the answer associated with the key you supply in the prompt.
from guidance import select
guidance_program =f'Do you want a joke or a poem? A '+ select(['joke', 'poem'], name='answer') # example from Guidance documentationguider.prompt(guidance_program)
Do you want a joke or a poem? A joke
{'answer': 'joke'}
The gen function
The gen function allows you to place conditions and constraints on the generated output.
from guidance import gen
guider.prompt(f'The capital of France is {gen("answer", max_tokens=1, stop=".")}')
The capital of France is Paris
{'answer': 'Paris'}
You can also use regular expressions to guide the output.
prompt =f"""Question: Luke has ten balls. He gives three to his brother. How many balls does he have left?Answer: """+ gen('answer', regex='\d+')
guider.prompt(prompt)
Question: Luke has ten balls. He gives three to his brother. How many balls does he have left?
Answer:7
{'answer': '7'}
prompt ='Generate a list of numberes in descending order. 19, 18,'+ gen('answer', max_tokens=50, stop_regex='[^\d]7[^\d]')guider.prompt(prompt)
Generate a list of numberes in descending order. 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8,
Using select and gen, you can guide the LLM to produce outputs conforming to the structure that you want (e.g., JSON).
Let’s create a prompt for generating fictional D&D-type characters.
sample_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]sample_armour = ["leather", "chainmail", "plate"]def generate_character_prompt( character_one_liner, weapons: list[str] = sample_weapons, armour: list[str] = sample_armour, n_items: int=3): prompt ='' prompt +="{" prompt +=f'"description" : "{character_one_liner}",' prompt +='"name" : "'+ gen(name="character_name", stop='"') +'",'# With guidance, we can call a GPU rather than merely random.randint() prompt +='"age" : '+ gen(name="age", regex="[0-9]+") +',' prompt +='"armour" : "'+ select(armour, name="armour") +'",' prompt +='"weapon" : "'+ select(weapons, name="weapon") +'",' prompt +='"class" : "'+ gen(name="character_class", stop='"') +'",' prompt +='"mantra" : "'+ gen(name="mantra", stop='"') +'",'# Again, we can avoid calling random.randint() like a pleb prompt +='"strength" : '+ gen(name="age", regex="[0-9]+") +',' prompt +='"quest_items" : [ 'for i inrange(n_items): prompt +='"'+ gen(name="items", list_append=True, stop='"') +'"'# We now pause a moment to express our thoughts on the JSON# specification's dislike of trailing commasif i < n_items -1: prompt +=',' prompt +="]" prompt +="}"return prompt
d = guider.prompt(generate_character_prompt("A quick and nimble fighter"))
{"description" : "A quick and nimble fighter","name" : "Rogue","age" :0,"armour" :"leather","weapon" : "crossbow","class" : "rogue","mantra" : "Stay nimble, stay quick.","strength" :10,"quest_items" : [ "a set of thieves' tools","a map of the local area","a set of lockpicks"]}
The Generated Dictionary:
d
{'items': ['a set of lockpicks',
'a map of the local area',
"a set of thieves' tools"],
'age': '10',
'mantra': 'Stay nimble, stay quick.',
'character_class': 'rogue',
'weapon': 'crossbow',
'armour': 'leather',
'character_name': 'Rogue'}
Convert to JSON
import jsonprint(json.dumps(d, indent=4))
{
"items": [
"a set of lockpicks",
"a map of the local area",
"a set of thieves' tools"
],
"age": "10",
"mantra": "Stay nimble, stay quick.",
"character_class": "rogue",
"weapon": "crossbow",
"armour": "leather",
"character_name": "Rogue"
}