
One NVIDIA-accelerated stack. Two products.
Veqa and Vigilium share the same GPU hardware, inference runtime, and compliance baseline. Single stack to operate, audit, and scale — different problem spaces it serves.
Core capabilities.
Purpose-built infrastructure for private, compliant AI inference.
NVIDIA GPU Infrastructure
NVIDIA RTX PRO 6000 Blackwell GPUs powering high-throughput, low-latency LLM inference and Voice AI workloads.
Zero Third-Party Cloud
All inference runs on dedicated, U.S.-based hardware. Customer data never touches a third-party server.
Dedicated, Isolated Compute
All inference runs on dedicated NVIDIA GPUs in U.S. facilities, isolated per customer at the container and process level — no shared multi-tenant inference with other vendors' workloads.
Open-Source Models
Powered by leading open-source LLMs — Qwen, Llama, and Nemotron — with no vendor lock-in.
Standards-Based APIs
OpenAI-compatible inference endpoints, AudioSocket for telephony, REST + WebSocket for orchestration. No proprietary protocols.
Compliance-Ready Architecture
Architected to support HIPAA, SOC 2, and PCI DSS deployments. BAA available on request. Zero third-party data subprocessors in the inference path.
The NVIDIA stack we build on.
We build on the NVIDIA product line, not around it. Below is the NVIDIA software and hardware our architecture is built on — NVIDIA GPUs and CUDA in the PoC today, with the full serving stack on our path to GA.
NVIDIA RTX PRO 6000 Blackwell
96 GB GDDR7 ECC, 24,064 CUDA cores. Workstation and Max-Q variants supported. A single-card GPU that fits a 49B-parameter LLM with headroom for batched ASR/TTS on the same node.
NVIDIA Riva NIM
Speech NIM microservices for production ASR and TTS. On the GA path for streaming low-latency English, alongside our open-source ASR/TTS.
NVIDIA Nemotron
Llama-3.3-Nemotron-Super-49B-Instruct as Veqa's primary conversation LLM. NVIDIA-tuned for reasoning and structured output.
NVIDIA Triton Inference Server
Multi-model, multi-framework inference serving. Designed to host LLM, ASR, and TTS models behind a single endpoint.
NVIDIA TensorRT-LLM
FP8 quantization and KV-cache optimization for low first-token latency on the 49B-class LLM.
NVIDIA Pipecat
Open-source voice-agent framework (NVIDIA-promoted) for ASR ↔ LLM ↔ TTS orchestration, turn-taking, and barge-in.
One operating surface.
Operating a product company means operating a manageable number of moving parts. Both Veqa and Vigilium share the same infrastructure baseline — the same hardware SKU, the same inference runtime, the same security model.
For customers, this means a single audit surface. For Machplace, it means one stack to harden, monitor, and upgrade.
- Both products run on the same NVIDIA RTX PRO 6000 Blackwell GPU SKU.
- Both products are built around NVIDIA Triton + TensorRT-LLM for inference serving.
- Both products run open-weight LLMs (no proprietary model dependencies).
- Both products are containerized for identical on-prem deployment.
- Both products share the same observability and security baseline.
Per-product deep dives.
Each product page details its own model choices, latency budget, and deployment topology.
Veqa technology
Asterisk + AudioSocket, custom async orchestration, faster-whisper / Parakeet / SenseVoice for ASR, F5-TTS / Kokoro / CosyVoice 2 for TTS, Nemotron-49B / Qwen 2.5 for LLM.
Full architecture →Vigilium technology
Continuous infrastructure scanners, LLM-generated plain-English reports, A–F grading rubric, compliance evidence collection. Detailed architecture on the Vigilium site.
Visit vigilium.ai →Want to talk shop?
We're happy to dive into the technical details with prospective partners, customers, and collaborators.