Full Time
Lagos
Posted 7 days ago

Website Jedayah AI

Post-Delivery Monitoring:

Set up observability for all live AI systems: workflow error rates, LLM API latency and cost, token consumption per task, retrieval hit rates, and end-to-end task completion rates
Build centralized error logging pipelines,all workflow failures written to Supabase with full context (input, state, error, timestamp), with Teams alerts to the ops team
Monitor confidence score distributions on live classification pipelines to detect and investigate distribution shift that may indicate model or data drift
Use LLM observability tools (LangSmith,Helicone, orLangfuse) to trace agent runs, inspect intermediate steps, andidentifyfailure patterns in production
Respond to client-reported issues,diagnose root causes from workflow logs and agent traces, deploy fixes within agreed SLAs, and communicate clearly with clients throughout
Produce monthly performance reports per client: tasks processed, errors caught, confidence distributions, uptime, cost per task, and recommended optimizations
Proactively identify degradation in model or retrieval quality before clients notice, review evals regularly and propose retraining or prompt updates as needed
Tool & Vendor Management:

Manage the engineering stack
Track LLM provider updates, model releases, deprecations, pricing changes, context window expansions, and new capabilities and assess impact on live systems proactively
Evaluate emerging agentic frameworks, vector store options, and automation platforms as they mature, recommend adoption with clear rationale and migration plans
Manage API usage budgets across all LLM and infrastructure vendors: monitor spend, flag anomalies, optimize model selection and caching strategies to control costs
Maintain a secure secrets and credentials management system across all environments: API keys, service accounts, OAuth tokens, and database credentials
Liaise directly with vendor support and developer relations teams for integration issues,early accessto new features, and technical escalations
Maintain an internal tool registry for every tool in the stack, document its purpose, owner, cost, alternative options, and replacement plan in the event of deprecation or failure
Requirements
Technical Skills:

Degree or certification in Computer Science, Software Engineering, Information Technology, or a related field
3+ years of software engineering experience, with at least 2 years building production AI or LLM-powered systems
Strong proficiency in Python and/or JavaScript (Node.js) async programming, API integration, data transformation, and working with AI SDKs
Experience building with LLMs: prompting, tool use / function calling, structured outputs, and streaming responses
Hands-on experience with LangChain or LlamaIndex, chains, agents, retrievers, memory, and tool integrations
Hands-on experience with LangGraph, stateful agent design, node/edge graphs, conditional routing, and multi-agent orchestration
Working experience with vector databases: Pinecone, Weaviate, pgvector, Chroma, or FAISS
Experience with REST APIs, webhooks, and system integrations, authentication, retry logic, and rate limit handling
Familiarity with automation platforms: n8n, Make, or equivalent
AI / ML Knowledge:

Solid understanding of RAG architectures ,chunking strategies, embedding models, retrieval methods, and re-ranking
Working knowledge of embeddings, semantic search, and vector similarity including trade-offs between embedding models
Understanding of LLM limitations: hallucinations, context window constraints, prompt sensitivity, and output non-determinism
Experience evaluating and improving AI system performance, offline evals, production monitoring, and iterative optimization
Exposure to open-source LLMs (Llama 3, Mistral, Qwen) and self-hosting inference (Ollama,vLLM, or TGI)
Infrastructure & DevOps:

Experience deploying on cloud platforms: AWS, GCP, or Azure
Working knowledge of Docker and CI/CD pipelines for AI workload deployment
Familiarity with monitoring and observability tools such as, LangSmith, Langfuse, Helicone, or Arize Phoenix
Experience with Supabaseor PostgreSQL for application data and pgvector for embedding storage
Nice to Have:

Experience with agent frameworks: AutoGen, CrewAI, or custom multi-agent orchestration patterns
Experience with workflow orchestration tools: Temporal, Celery, or BullMQ for long running async processes
Familiarity with model fine-tuning: LoRA/QLoRA, dataset curation, and evaluation of fine-tuned outputs
Experience with hybrid search: combining dense vector retrieval with BM25 / sparse retrieval and cross-encoder re-ranking
Experience withTermii/Twilio WhatsApp Business API for conversational agent delivery
Background in building SaaS products, internal tools, or multi-tenant applications with data isolation
Experience working in a startup or agency environment with comfortable context-switching across multiple client projects
Understanding of security, rate limiting, secrets management, and production reliability practices
Contributions to open-source AI agent frameworks, LLM tooling, or automation projects

To apply for this job email your details to careers@jedayahAI.com