SAS Hackathon Winner: Combating Misinformation with Trusted Data
Using artificial intelligence to verify and classify trustworthy information
Winning four awards in the SAS 2024 Hackathon, Butterfly Data’s AI tool helps combat misinformation by assessing the trustworthiness of open-source data using NLP and classification models. The system assigns a trust score to news sources and provides explanations via an LLM, enhancing transparency and empowering informed, reliable decision-making.
The Butterfly Challenge
With news events constantly flooding media channels, misinformation and disinformation have become serious issues, influencing decisions and potentially causing harm. A prime example is the impact of misinformation campaigns during public campaigns, where misleading narratives can sway opinions.
Analysts often spend excessive time verifying the reliability of their data sources, ensuring they are not unintentionally using unreliable information. This verification process is crucial, particularly for the public sector, where decision-making must be based on accurate and ethical data.
Butterfly Data, as a B-Corp, is committed to ethical data use and therefore sought to develop a solution to streamline this process as part of the 2024 SAS Hackathon. The aim was to empower analysts with a tool that efficiently assesses data trustworthiness while maintaining transparency and accuracy.
Our Hackathon team developed an tool designed to assess the trustworthiness of open-source data. By leveraging advanced natural language processing (NLP) and classification models, the system analyses news sources and assigns a trustworthiness score. A large language model (LLM) then provides contextual explanations, enhancing transparency and educating users on the credibility of their information.
To ensure maximum usability and impact, the solution was developed using Agile methodology and incorporated leading technologies, including:
SAS Viya for advanced analytics and machine learning
Workbench, Viya Copilot and Python for efficient model development
Git integration and VS Code to enable seamless collaboration
Centralised data in CAS to maintain accuracy and up-to-date dashboards
The system’s dashboard provides users with a comprehensive summary of trustworthy sources, allowing them to make well-informed decisions.
Additionally, Information Data Catalog APIs enhance transparency by tracing data lineage and quality, giving users a clear understanding of the origins and reliability of their data.
Delivering Reliability Scores using Natural Language Processing
The misinformation detection tool successfully provides users with a structured, scalable, and efficient solution to verify data reliability.
By offering a dashboard that summarises source credibility and highlighting data lineage and quality, Butterfly Data is able to theoretically equip analysts with the ability to confidently filter out unreliable sources.
This innovative approach not only helps mitigate the spread of misinformation but also ensures that decision-makers in various sectors, including government and media, can base their insights on ethical and accurate data.