Nemotron-Personas-Brazil:为国家级主权AI共创数据
📄 中文摘要
为实现国家级主权AI,Nemotron-Personas-Brazil项目专注于共创高质量、符合巴西文化和语言特点的数据集。该项目通过与巴西本地专家、语言学家和文化机构紧密合作,构建了大规模、多模态的数据资源,旨在克服现有通用AI模型在处理特定文化背景信息时存在的偏差和局限性。数据收集过程严格遵循隐私保护和伦理准则,确保数据的代表性和多样性,覆盖了巴西丰富的社会、经济和地理维度。数据集包含文本、语音和图像等多模态数据,特别强调巴西葡萄牙语的地域变体、俚语、历史文化知识以及独特的社会叙事。在数据标注阶段,项目采用了先进的众包与专家审核相结合的策略,以保证标注的准确性和一致性,从而为训练和微调能够深刻理解并反映巴西国情的AI模型提供坚实基础。通过提供这些定制化数据,Nemotron-Personas-Brazil致力于赋能巴西开发和部署具有更高准确性、相关性和信任度的AI系统,从而在教育、医疗、公共服务等领域实现更有效的本地化应用,并最终支持巴西在人工智能领域实现技术自主和战略独立。
📄 English Summary
Nemotron-Personas-Brazil: Co-Designed Data for Sovereign AI
To enable sovereign AI at a national level, the Nemotron-Personas-Brazil project focuses on co-creating high-quality datasets tailored to Brazilian culture and language. This initiative collaborates closely with local Brazilian experts, linguists, and cultural institutions to build extensive, multimodal data resources, aiming to overcome the biases and limitations of existing general AI models when processing culturally specific information. The data collection process rigorously adheres to privacy protection and ethical guidelines, ensuring representativeness and diversity across Brazil's rich social, economic, and geographical dimensions. The dataset encompasses multimodal data, including text, speech, and images, with a particular emphasis on regional variations of Brazilian Portuguese, slang, historical and cultural knowledge, and unique social narratives. During the data annotation phase, the project employs an advanced strategy combining crowdsourcing with expert review to guarantee annotation accuracy and consistency. This foundational data facilitates the training and fine-tuning of AI models that deeply understand and reflect the Brazilian national context, providing a solid basis for development. By offering these customized data, Nemotron-Personas-Brazil aims to empower Brazil in developing and deploying AI systems with higher accuracy, relevance, and trustworthiness. This will enable more effective localized applications in sectors such as education, healthcare, and public services, ultimately supporting Brazil's technological autonomy and strategic independence in artificial intelligence.