Technology

One NVIDIA-accelerated stack. Two products.

Veqa and Vigilium share the same GPU hardware, inference runtime, and compliance baseline. Single stack to operate, audit, and scale — different problem spaces it serves.

Core capabilities.

Purpose-built infrastructure for private, compliant AI inference.

NVIDIA GPU Infrastructure

NVIDIA RTX PRO 6000 Blackwell GPUs powering high-throughput, low-latency LLM inference and Voice AI workloads.

Zero Third-Party Cloud

All inference runs on dedicated, U.S.-based hardware. Customer data never touches a third-party server.

Dedicated, Isolated Compute

All inference runs on dedicated NVIDIA GPUs in U.S. facilities, isolated per customer at the container and process level — no shared multi-tenant inference with other vendors' workloads.

Open-Source Models

Standards-Based APIs

OpenAI-compatible inference endpoints, AudioSocket for telephony, REST + WebSocket for orchestration. No proprietary protocols.

Compliance-Ready Architecture

Architected to support HIPAA, SOC 2, and PCI DSS deployments. BAA available on request. Zero third-party data subprocessors in the inference path.

The NVIDIA stack we build on.

We build on the NVIDIA product line, not around it. Below is the NVIDIA software and hardware our architecture is built on — NVIDIA GPUs and CUDA in the PoC today, with the full serving stack on our path to GA.

Hardware

NVIDIA RTX PRO 6000 Blackwell

96 GB GDDR7 ECC, 24,064 CUDA cores. Workstation and Max-Q variants supported. A single-card GPU that fits a 49B-parameter LLM with headroom for batched ASR/TTS on the same node.

Speech

NVIDIA Riva NIM

Speech NIM microservices for production ASR and TTS. On the GA path for streaming low-latency English, alongside our open-source ASR/TTS.

LLM

NVIDIA Nemotron

Llama-3.3-Nemotron-Super-49B-Instruct as Veqa's primary conversation LLM. NVIDIA-tuned for reasoning and structured output.

Serving

NVIDIA Triton Inference Server

Multi-model, multi-framework inference serving. Designed to host LLM, ASR, and TTS models behind a single endpoint.

Optimization

NVIDIA TensorRT-LLM

FP8 quantization and KV-cache optimization for low first-token latency on the 49B-class LLM.

Orchestration

NVIDIA Pipecat

Open-source voice-agent framework (NVIDIA-promoted) for ASR ↔ LLM ↔ TTS orchestration, turn-taking, and barge-in.

One operating surface.

Operating a product company means operating a manageable number of moving parts. Both Veqa and Vigilium share the same infrastructure baseline — the same hardware SKU, the same inference runtime, the same security model.

For customers, this means a single audit surface. For Machplace, it means one stack to harden, monitor, and upgrade.

Both products run on the same NVIDIA RTX PRO 6000 Blackwell GPU SKU.
Both products are built around NVIDIA Triton + TensorRT-LLM for inference serving.
Both products run open-weight LLMs (no proprietary model dependencies).
Both products are containerized for identical on-prem deployment.
Both products share the same observability and security baseline.

Per-product deep dives.

Each product page details its own model choices, latency budget, and deployment topology.

Veqa technology

Asterisk + AudioSocket, custom async orchestration, faster-whisper / Parakeet / SenseVoice for ASR, F5-TTS / Kokoro / CosyVoice 2 for TTS, Nemotron-49B / Qwen 2.5 for LLM.

Full architecture →

Vigilium technology

Continuous infrastructure scanners, LLM-generated plain-English reports, A–F grading rubric, compliance evidence collection. Detailed architecture on the Vigilium site.

Visit vigilium.ai →

Want to talk shop?

We're happy to dive into the technical details with prospective partners, customers, and collaborators.

Get in Touch