Models
The page contains a single, canonical reference for what models are available right now, how to choose the right one, and how to address models in API calls.
deAPI gives you a unified API across multiple open-source models running on a decentralized GPU cloud. Models evolve frequently; always rely on the live list (endpoint at the bottom) for the current availability and IDs.
How model selection works
Every task requires a
model
parameter. Example:model: "Flux1schnell"
for Text-to-Image ormodel: "WhisperLargeV3"
for speech/video transcription.Display names vs. API IDs. In tables and UI we show human-friendly names (e.g., “FLUX.1-schnell”). The API accepts stable IDs (lowercase, hyphen/period separated). Use the Models endpoint to fetch the exact
id
strings.Quality ↔ Speed trade-off. Larger models often yield higher quality but cost more and take longer. Use our Price Calculator on the homepage to estimate cost before running large jobs.
Versioning & lifecycle. Models may be updated, superseded, or deprecated. Your application should resolve the model ID at runtime (from the live list) or pin to a specific version string if reproducibility is critical.
Safety & acceptable use. Follow the Terms of Service. Some content types may be blocked or filtered. See the Safety section in each task’s docs.
Supported tasks & models (curated)
The table below lists the core, production-ready models available today. For the authoritative list (including experimental or newly added ones), use the Models endpoint.
Text-to-Image
Generate images from text
Concept art, prototyping, creative exploration
FLUX.1-schnell
Flux1schnell
txt2img
Text-to-Speech
Turn text into natural voice
Narration, accessibility, product voices in multiple languages and tones
Kokoro-82M
Kokoro
txt2audio
Video-to-Text
Transcribe video into text
Subtitles, captions, indexing, SEO, datasets (with optional timestamps)
Whisper large-v3
WhisperLargeV3
video2txt
Image/Text-to-Video
Generate lishort AI videos
Cinematic motion, transitions, stylization
LTX-Video-0.9.8 13B
Ltxv_13B_0_9_8_Distilled_FP8
img2video
txt2video
Image-to-Text
Extract meaning from images
Descriptions, OCR, accessibility, moderation
Nanonets_Ocr_S_F16
Nanonets-Ocr-S-F16
img2txt
Audio-to-Text
Convert audio into text
Subtitles, notes, search, accessibility (multi-language)
Whisper large-v3
WhisperLargeV3
video2txt
Text-to-Embedding
Create vector embeddings
Search, RAG, semantic similarity, clustering
BGE M3
Bge_M3_FP16
txt2embedding
Note: Display names and example IDs above are provided for clarity; always fetch the live model list for the exact
id
strings currently enabled on the network.
Picking the right model
Image Generation (Text-to-Image): Start with
Flux1schnell
for fast iteration. Increase steps/resolution for quality; control style via prompt and (optionally) LoRAs.Speech Generation (TTS):
Kokoro
offers natural prosody across multiple languages/voices. Choose voices/language in the payload; test speed vs. quality for your use case.Text Transcription (Video/Audio-to-Text):
WhisperLargeV3
is a strong general-purpose baseline with robust multilingual support. For long videos, enable timestamps and chunking.Text Recognition (Image-to-Text):
Nanonets-Ocr-S-F16
targets clear captions and text extraction. For complex layouts, consider multiple passes or post-processing.Video Generation (Image-to-Video / Text-to-Video):
Ltxv_13B_0_9_8_Distilled_FP8
is suited for short clips and stylized motion. Start with low duration/frames to validate aesthetics, then scale up.Embedding (Text-to-Embedding):
Bge_M3_FP16
provides dense vector embeddings for semantic search, clustering, and retrieval-augmented generation (RAG). Use for similarity queries or knowledge base indexing.
For a deeper discussion of strengths/limits and typical parameter sets per task, see Model Selection.
API usage examples
1) List models (fetch IDs at runtime)
curl -X GET "https://api.deapi.ai/api/v1/client/models" \
-H "Authorization: Bearer $DEAPI_API_KEY"
2) Use a model in Text-to-Image
curl -X POST "https://api.deapi.ai/api/v1/client/txt2img" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "isometric cozy cabin at dusk, soft rim light, artstation trending",
"model": "Flux1schnell",
"width": 768,
"height": 768,
"steps": 25,
"guidance": 3.5,
"seed": 12345
}'
3) Use a model in Video-to-Text (YouTube)
curl -X POST "https://api.deapi.ai/api/v1/client/vid2txt" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"youtube_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"include_ts": true,
"model": "WhisperLargeV3"
}'
Jobs return a
request_id
. Poll results withGET /api/v1/client/request-status/{request_id}
.
Best practices
Resolve the model list dynamically. Don’t hardcode IDs—fetch once at startup or periodically.
Pin versions for reproducibility. If you need bit-for-bit repeats (e.g., in T2I), pin model version + set seed.
Budget before scaling. Larger models and higher resolution/steps cost more—use the calculator on the homepage.
Handle deprecation. Implement a fallback path if a model becomes unavailable (e.g., switch to a recommended successor).
Related docs
Model Selection: how to choose the best model per task and budget.
Endpoints: Text-to-Image, Text-to-Speech, Image-to-Text (OCR), Video-to-Text, Get Results, Check Balance.
Live models (API)
Use the endpoint below to retrieve the current, authoritative list of models (with id
strings to use in requests):
GET https://api.deapi.ai/api/v1/client/models
(The response includes stable id
values to pass as the model
parameter in any task endpoint.)
Last updated