Vectorize Your Data

Upload chunked documents or raw text. Select your tier and target database format. Get production-ready vectors.

Drop file or click to upload
JSON (chunked), TXT, or paste text below
Script
FREE
100 chunks/day
Agent
$0.0005
per chunk
Pro
$0.001
OpenAI 3-small
Pro+
$0.01
+ enrichment
Vectors
0
Dimensions
0
Format
JSON

Built for AI Agents

No API keys. No accounts. No OAuth. Just pay and use. Programmatic access designed for autonomous systems.

Zero Config

No API keys to manage. No developer accounts. Discover pricing at GET / and start embedding.

Pay Per Request

USDC on Solana. Sub-second finality. Include TX signature in X-PAYMENT header.

Machine Readable

JSON in, JSON out. Consistent error codes. Built for programmatic access by autonomous agents.

Agent Integration Example

# 1. Discover pricing
GET https://api.vectorize.cc/

# 2. Estimate cost
POST https://api.vectorize.cc/estimate
{"chunks": ["text1", "text2"], "tier": "pro"}

# 3. Pay (Solana USDC transfer to payTo address)

# 4. Vectorize with payment proof
POST https://api.vectorize.cc/vectorize/pro
X-PAYMENT: <solana-tx-signature>
{"chunks": ["text1", "text2"], "output_format": "pinecone"}

Models By Tier

Each tier uses a specific model optimized for that price point. No configuration needed.

384 dim ~80MB model

all-MiniLM-L6-v2

Script & Agent Tiers

Microsoft's distilled sentence-transformer trained on 1B+ sentence pairs from diverse sources. Runs entirely on-device with zero external API calls—your data never leaves the server. 6-layer architecture delivers 5x faster inference than BERT-base while maintaining 90% of its accuracy. Ideal for high-throughput workloads where speed and privacy matter more than maximum precision.

PERFECT FOR
  • Startups — MVP prototyping, validating RAG concepts before scaling
  • E-commerce — Product search, recommendation engines, catalog matching
  • Internal Tools — Company wikis, Slack/Discord bots, FAQ systems
  • Privacy-First — Healthcare prototypes, on-prem deployments, GDPR compliance
CAPABILITIES
  • • Semantic similarity search
  • • Question-answer matching
  • • Duplicate detection
  • • Clustering & classification
  • • Paraphrase identification
SPECS
  • • Dimensions: 384
  • • Max tokens: 512
  • • Latency: ~5ms/chunk
  • • Provider: Local (sentence-transformers)
  • • Languages: English-optimized
RECOMMENDED
1536 dim OpenAI API

text-embedding-3-small

Pro Tier

OpenAI's production-grade embedding model, released January 2024. Achieves 62.3% on MTEB benchmark while being 5x cheaper than ada-002. Native support for 100+ languages with strong cross-lingual retrieval. Handles documents up to 8,191 tokens—perfect for long-form content without chunking overhead. The sweet spot between cost, speed, and quality for most production workloads.

PERFECT FOR
  • SaaS Products — Help centers, documentation search, in-app assistants
  • Customer Support — Ticket routing, knowledge base search, agent assist
  • Content Platforms — Article recommendations, content discovery, tagging
  • Enterprise Search — Confluence, Notion, Google Drive indexing
CAPABILITIES
  • • Multilingual semantic search
  • • Long document embedding
  • • Cross-lingual retrieval
  • • Code search & documentation
  • • RAG with hybrid search
SPECS
  • • Dimensions: 1,536
  • • Max tokens: 8,191
  • • Latency: ~50ms/chunk
  • • MTEB score: 62.3%
  • • Languages: 100+
3072 dim OpenAI API

text-embedding-3-large

Pro+ Tier

OpenAI's highest-fidelity embedding model with 3,072 dimensions. Achieves 64.6% on MTEB—top-tier performance for nuanced semantic understanding. Excels at distinguishing subtle differences in meaning, technical terminology, and domain-specific jargon. The extra dimensionality captures relationships that smaller models miss, critical for high-stakes retrieval where precision matters.

PERFECT FOR
  • Legal — Contract analysis, case law search, due diligence, compliance review
  • Medical/Biotech — Clinical trial data, research papers, drug interaction databases
  • Financial — SEC filings, earnings calls, risk analysis, regulatory documents
  • Research — Academic papers, patents, technical specifications, scientific literature
CAPABILITIES
  • • Fine-grained semantic matching
  • • Technical jargon understanding
  • • Multi-hop reasoning support
  • • Disambiguation of similar terms
  • • Complex query understanding
SPECS
  • • Dimensions: 3,072
  • • Max tokens: 8,191
  • • Latency: ~80ms/chunk
  • • MTEB score: 64.6%
  • • Languages: 100+
PRO+ BONUS: AI METADATA ENRICHMENT
Each chunk is analyzed by Claude to extract: topics (2-3 key themes), entities (people, organizations, concepts), and relationships (connections between entities). Returned as structured metadata alongside embeddings—supercharge your vector search with semantic tags.

Your Database, Your Choice

Export vectors in the exact format your database expects. Zero conversion needed.

ChromaDB
Open-source embedding database for AI apps
Pinecone
Managed vector database for production
Weaviate
Vector search with GraphQL interface
Qdrant
High-performance vector similarity search
Milvus
Cloud-native vector database at scale
pgvector
Vector search in PostgreSQL