{"id":20738,"date":"2026-03-27T05:20:39","date_gmt":"2026-03-27T05:20:39","guid":{"rendered":"https:\/\/www.jobsinnaija.com\/?post_type=job_listing&#038;p=20738"},"modified":"2026-03-28T22:33:54","modified_gmt":"2026-03-28T22:33:54","slug":"odixcity-consulting-worldwide-full-time-ai-evaluation-specialist","status":"publish","type":"job_listing","link":"https:\/\/www.jobsinnaija.com\/?job_listing=odixcity-consulting-worldwide-full-time-ai-evaluation-specialist","title":{"rendered":"AI Evaluation Specialist"},"content":{"rendered":"<p>Job Summary<\/p>\n<p>&nbsp;<\/p>\n<p>We are looking for a sharp detailed senior to architect the systems that measure and improve our generative AI models. ]<\/p>\n<p>You will work at the intersection of data science, product, and research to ensure our AI systems are not only accurate but also safe, unbiased, and aligned with human preferences.<\/p>\n<p>Key Responsibilities<\/p>\n<p>&nbsp;<\/p>\n<p>Design and implement robust automated evaluation frame works (using python) test LLMs for tasks like reasoning, coding, and summarization.<\/p>\n<p>Lead the development of annotation rubrics and manage workflows, for human evaluators to generate high context preference data and golden datasets.<\/p>\n<p>Design and execute adversarial testing (re teaming) to identify vulnerable, hallucinations, and biases in mode outputs before deployment.<\/p>\n<p>Develop and calibrate reliable LLM-based evaluators to replace human raters at scale for specific metrics, validating their correlation with human judgment.<\/p>\n<p>Analyze evaluation results to pinpoint specific model weaknesses (e.g. model fails at multi- step reasoning in finance contexts) and present actionable insights to modeling and product teams.<\/p>\n<p>Build and maintain internal evaluation in platform and dashboards to track model performance across different versions and use cases.<\/p>\n<p>Requirements<\/p>\n<p>&nbsp;<\/p>\n<p>A Degree in Computer science, information Technology, Data Science or a relating field.<\/p>\n<p>5+ years of experience in machine learning, Data science, or AI Evaluation<\/p>\n<p>Proven track record of designing evaluation strategies for NLP or Generative AI products.<\/p>\n<p>Expert-level proficiency in Python for scripting evaluations and analyzing results (pandas, NumPy).<\/p>\n<p>Strong ability to query data (SQL) and perform statical analysis to validate evaluation confidence intervals and inter-annotator agreement.<\/p>\n<p>Advanced ability to craft prompts for crafts prompts for both model testing and steering LLM-based evaluators.<\/p>\n","protected":false},"author":790,"featured_media":0,"template":"","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"_promoted":"","_job_location":"Worldwide","_application":"jobscandidate8@gmail.com","_company_name":"Odixcity Consulting","_company_website":"","_company_tagline":"","_company_twitter":"","_company_video":"","_filled":0,"_featured":0,"_remote_position":1,"_job_salary_currency":"","_job_salary_unit":""},"job-types":[3],"class_list":{"0":"post-20738","1":"job_listing","2":"type-job_listing","3":"status-publish","4":"hentry","5":"job_listing_type-full-time","7":"job-type-full-time"},"_links":{"self":[{"href":"https:\/\/www.jobsinnaija.com\/index.php?rest_route=\/wp\/v2\/job-listings\/20738","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jobsinnaija.com\/index.php?rest_route=\/wp\/v2\/job-listings"}],"about":[{"href":"https:\/\/www.jobsinnaija.com\/index.php?rest_route=\/wp\/v2\/types\/job_listing"}],"author":[{"embeddable":true,"href":"https:\/\/www.jobsinnaija.com\/index.php?rest_route=\/wp\/v2\/users\/790"}],"wp:attachment":[{"href":"https:\/\/www.jobsinnaija.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=20738"}],"wp:term":[{"taxonomy":"job_listing_type","embeddable":true,"href":"https:\/\/www.jobsinnaija.com\/index.php?rest_route=%2Fwp%2Fv2%2Fjob-types&post=20738"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}