Sampler SDK
The Sampler SDK gives you access to the raw Grok-1 model allowing you to do your own prompt engineering. Please note that the version of Grok-1 currently available in the IDE and via the API is fine-tuned for conversations. This means, phrasing a task in the form of a dialogue often yields the best results.
Getting started
To get started with the Sampler SDK, create a new client and access the sampler property
.
Sampling from the model
The main function of the Sampler SDK is the sample
function, which given a prompt samples tokens from the model. The function
returns an async iterator that streams out the generated tokens.
"""A simple example demonstrating text completion."""
import asyncio
import xai_sdk
async def main():
"""Runs the example."""
client = xai_sdk.Client()
prompt = "The answer to life and the universe is"
print(prompt, end="")
async for token in client.sampler.sample(prompt="", inputs=(prompt,), max_len=3):
print(token.token_str, end="")
print("")
asyncio.run(main())
The sampler can also provide completion over multimodal inputs.
"""An example demonstrating multimodal completion."""
import asyncio
import xai_sdk
async def main():
"""Runs the example."""
with open("dog.png", 'rb') as file:
image = file.read()
client = xai_sdk.Client()
prompt = "What is this? "
print(prompt, end="")
async for token in client.sampler.sample(
prompt="",
inputs=(prompt, image),
max_len=8,
model_name="vlm-1"
):
print(token.token_str, end="")
print("")
asyncio.run(main())
API Reference
xai_sdk.sampler
Sampler API to generate text completions.
This API gives access to the raw underlying models and is the most versatile and complex way to interact with our models.
xai_sdk.sampler.AsyncSampler
Allows sampling from the raw model API. All functions are asynchronous.
Source code in xai_sdk/sampler.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
xai_sdk.sampler.AsyncSampler.__init__(stub, initial_rng_seed=None)
Initializes a new instance of the Sampler
class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stub |
SamplerStub
|
The gRPC stub to use for interacting with the API. |
required |
initial_rng_seed |
Optional[int]
|
First RNG seed to use for sampling. Each time we sample from the model and no RNG seed is explicitly specified, we deterministically generate a new seed based on the initial seed. This ensures that the generated responses are deterministic given a call order. If no initial seed is specified, we sample a random initial seed. |
None
|
Source code in xai_sdk/sampler.py
xai_sdk.sampler.AsyncSampler.sample(*, prompt, inputs=(), model_name='', max_len=256, temperature=0.7, nucleus_p=0.95, stop_tokens=None, stop_strings=None, rng_seed=None, return_attention=False, allowed_tokens=None, disallowed_tokens=None, augment_tokens=True)
async
Generates a model response by continuing prompt
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt |
Union[str, Sequence[int], Sequence[Token]]
|
[Deprecated, use inputs instead] Prompt to continue. This can either be a
string, a sequence of token IDs, or a sequence of |
required |
inputs |
Sequence[Union[str, Sequence[int], bytes]]
|
Multimodal input of the model. This can be a sequence of strings, token IDs, image in bytes or base64 encoded string. |
()
|
model_name |
str
|
Name of the model to sample from. Leave empty to sample from the default model. |
''
|
max_len |
int
|
Maximum number of tokens to generate. |
256
|
temperature |
float
|
Temperature of the final softmax operation. The lower the temperature, the lower the variance of the token distribution. In the limit, the distribution collapses onto the single token with the highest probability. |
0.7
|
nucleus_p |
float
|
Threshold of the Top-P sampling technique: We rank all tokens by their probability and then only actually sample from the set of tokens that ranks in the Top-P percentile of the distribution. |
0.95
|
stop_tokens |
Optional[list[str]]
|
A list of strings, each of which will be mapped independently to a single token. If a string does not map cleanly to one token, it will be silently ignored. If the network samples one of these tokens, sampling is stopped and the stop token is not included in the response. |
None
|
stop_strings |
Optional[list[str]]
|
A list of strings. If any of these strings occurs in the network output, sampling is stopped but the string that triggered the stop will be included in the response. Note that the response may be longer than the stop string. For example, if the stop string is "Hel" and the network predicts the single-token response "Hello", sampling will be stopped but the response will still read "Hello". |
None
|
rng_seed |
Optional[int]
|
Seed of the random number generator used to sample from the model outputs. If
unspecified, a seed is chosen deterministically from the |
None
|
return_attention |
bool
|
If true, returns the attention mask. Note that this can significantly increase the response size for long sequences. |
False
|
allowed_tokens |
Optional[Sequence[Union[int, str]]]
|
If set, only these tokens can be sampled. Invalid input tokens are
ignored. Only one of |
None
|
disallowed_tokens |
Optional[Sequence[Union[int, str]]]
|
If set, these tokens cannot be sampled. Invalid input tokens are
ignored. Only one of |
None
|
augment_tokens |
bool
|
If true, strings passed to |
True
|
Yields:
Type | Description |
---|---|
AsyncGenerator[Token, None]
|
A sequence of |
Source code in xai_sdk/sampler.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
xai_sdk.sampler.AsyncSampler.tokenize(prompt, model_name='')
async
Converts the given prompt text into a sequence of discrete tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt |
str
|
Text to convert into a sequence of tokens. |
required |
model_name |
str
|
Model whose tokenizer should be used. Make sure to use the same value when tokenizing and when sampling as different models use different tokenizers. Leave empty to use the default model's tokenizer. |
''
|
Returns:
Type | Description |
---|---|
list[Token]
|
A sequence of discrete tokens that represent the original |
Source code in xai_sdk/sampler.py
xai_sdk.sampler.Token
dataclass
A token is an element of our vocabulary that has a unique index and string representation.
A token can either be sampled from a model or provided by the user (i.e. prompted). If the token comes from the mode, we may have additional metadata such as its sampling probability, the attention pattern used when sampling the token, and alternative tokens.
Source code in xai_sdk/sampler.py
xai_sdk.sampler.Token.from_proto(proto)
classmethod
Converts the protocol buffer instance to a Token
instance.
Source code in xai_sdk/sampler.py
xai_sdk.sampler.log_budget_update(budget)
Logs a budget update.