Models

The page contains a single, canonical reference for what models are available right now, how to choose the right one, and how to address models in API calls.

deAPI gives you a unified API across multiple open-source models running on a decentralized GPU cloud. Models evolve frequently; always rely on the live list (endpoint at the bottom) for the current availability and IDs.

How model selection works

Every task requires a model parameter. Example: model: "Flux1schnell" for Text-to-Image or model: "WhisperLargeV3" for speech/video transcription.
Display names vs. API IDs. In tables and UI we show human-friendly names (e.g., “FLUX.1-schnell”). The API accepts stable IDs (lowercase, hyphen/period separated). Use the Models endpoint to fetch the exact id strings.
Quality ↔ Speed trade-off. Larger models often yield higher quality but cost more and take longer. Use our Price Calculator on the homepage to estimate cost before running large jobs.
Versioning & lifecycle. Models may be updated, superseded, or deprecated. Your application should resolve the model ID at runtime (from the live list) or pin to a specific version string if reproducibility is critical.
Safety & acceptable use. Follow the Terms of Service. Some content types may be blocked or filtered. See the Safety section in each task’s docs.

Supported tasks & models (curated)

The table below lists the core, production-ready models available today. For the authoritative list (including experimental or newly added ones), use the Models endpoint.

Service (Task)

Short Summary

What it’s for

Display Name

API Model Slug

Task key(s)

Text-to-Image

Generate images from text

Concept art, prototyping, creative exploration

FLUX.1-schnell

Flux1schnell

txt2img

Text-to-Speech

Turn text into natural voice

Narration, accessibility, product voices in multiple languages and tones

Kokoro-82M

Kokoro

txt2audio

Video-to-Text

Transcribe video into text

Subtitles, captions, indexing, SEO, datasets (with optional timestamps)

Whisper large-v3

WhisperLargeV3

video2txt

Image/Text-to-Video

Generate lishort AI videos

Cinematic motion, transitions, stylization

LTX-Video-0.9.8 13B

Ltxv_13B_0_9_8_Distilled_FP8

img2video txt2video

Image-to-Text

Extract meaning from images

Descriptions, OCR, accessibility, moderation

Nanonets_Ocr_S_F16

Nanonets-Ocr-S-F16

img2txt

Audio-to-Text

Convert audio into text

Subtitles, notes, search, accessibility (multi-language)

Whisper large-v3

WhisperLargeV3

video2txt

Text-to-Embedding

Create vector embeddings

Search, RAG, semantic similarity, clustering

BGE M3

Bge_M3_FP16

txt2embedding

Note: Display names and example IDs above are provided for clarity; always fetch the live model list for the exact id strings currently enabled on the network.

Picking the right model

Image Generation (Text-to-Image): Start with Flux1schnell for fast iteration. Increase steps/resolution for quality; control style via prompt and (optionally) LoRAs.
Speech Generation (TTS): Kokoro offers natural prosody across multiple languages/voices. Choose voices/language in the payload; test speed vs. quality for your use case.
Text Transcription (Video/Audio-to-Text): WhisperLargeV3 is a strong general-purpose baseline with robust multilingual support. For long videos, enable timestamps and chunking.
Text Recognition (Image-to-Text): Nanonets-Ocr-S-F16 targets clear captions and text extraction. For complex layouts, consider multiple passes or post-processing.
Video Generation (Image-to-Video / Text-to-Video): Ltxv_13B_0_9_8_Distilled_FP8 is suited for short clips and stylized motion. Start with low duration/frames to validate aesthetics, then scale up.
Embedding (Text-to-Embedding): Bge_M3_FP16 provides dense vector embeddings for semantic search, clustering, and retrieval-augmented generation (RAG). Use for similarity queries or knowledge base indexing.

For a deeper discussion of strengths/limits and typical parameter sets per task, see Model Selection.

API usage examples

1) List models (fetch IDs at runtime)

curl -X GET "https://api.deapi.ai/api/v1/client/models" \
  -H "Authorization: Bearer $DEAPI_API_KEY"

2) Use a model in Text-to-Image

curl -X POST "https://api.deapi.ai/api/v1/client/txt2img" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "isometric cozy cabin at dusk, soft rim light, artstation trending",
    "model": "Flux1schnell",
    "width": 768,
    "height": 768,
    "steps": 25,
    "guidance": 3.5,
    "seed": 12345
  }'

3) Use a model in Video-to-Text (YouTube)

curl -X POST "https://api.deapi.ai/api/v1/client/vid2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "youtube_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "include_ts": true,
    "model": "WhisperLargeV3"
  }'

Jobs return a request_id. Poll results with GET /api/v1/client/request-status/{request_id}.

Best practices

Resolve the model list dynamically. Don’t hardcode IDs—fetch once at startup or periodically.
Pin versions for reproducibility. If you need bit-for-bit repeats (e.g., in T2I), pin model version + set seed.
Budget before scaling. Larger models and higher resolution/steps cost more—use the calculator on the homepage.
Handle deprecation. Implement a fallback path if a model becomes unavailable (e.g., switch to a recommended successor).

Model Selection: how to choose the best model per task and budget.
Endpoints: Text-to-Image, Text-to-Speech, Image-to-Text (OCR), Video-to-Text, Get Results, Check Balance.

Live models (API)

Use the endpoint below to retrieve the current, authoritative list of models (with id strings to use in requests):

GET https://api.deapi.ai/api/v1/client/models

(The response includes stable id values to pass as the model parameter in any task endpoint.)

PreviousQuickstart NextPricing

Last updated 20 days ago