Exploring the Power of AI in Community Directories: A Research Journey to Leverage Large Language Models with Domain Adaptation

AI Industry Project: Integration of BERT, PyTorch and Large Language Models (LLM) in AI engineering and Natural Language Processing (NLP)

Volunteer Camille Sze Pui Ko
chose to undertake her Masters in AI and Machine Learning industry research project with Connecting Up over 2023 to improve the SAcommunity community information directory. Using AI to provide content for the description field (520 - Description Note) creating a natural language summary for each service, based on the existing data elements within that record, across 14,500 records in the directory.

AI Industry Project Outcomes
Camille's work allows for population of a new user-friendly content field providing a brief general description of an organisations purpose, program and the various services provided to help make the record identifiable. This optimises what would be a time-consuming manual process for content creation. And, provides another pathway for data cleansing, with the AI generated summary providing insights into missing or incorrect data elements given elsewhere within an individual record.

AI Industry Project Impact
Directory improvements benefit the community by helping in searching for, identifying, and retrieving relevant information for referrals, communication, consultation and research for people, providers and decision-makers to find social and assistive services that can help them.

In addition, the content designation used to characterise the data elements within records, supporting manipulation of that data, is based on MARC 21 Format for Community Information, a format maintained by the Library of Congress, and a widely used standard for the representation of community information, consequently Camille's project has far-reaching applications for other information services, directories and public libraries.

AI Industry Project - Discover More
Camille's AI Project YouTube presentation
Camille's GitHub Link

Image: Research Design Model 

AI Industry Project Technical Overview
This project focused on advancing Natural Language Processing (NLP) and AI engineering by integrating BERT (a Large Language Model (LLM)), while also incorporating concepts of domain adaptation and extensions of the transformer architecture.

By leveraging Python and PyTorch for model development, this initiative embraced the potential of transformers to adapt to specific domains through fine-tuning, enhancing the model's relevance and effectiveness across varied datasets.Furthermore, an innovative extension to the transformer model was implemented to support summarization tasks, which included the integration of a summarization token to guide the model in generating concise summaries of extensive documents.

Git was employed for version control, ensuring organized and collaborative development, while Docker facilitated containerization, simplifying deployment and scaling. The technical framework was designed for robust data management and processing capabilities, making it well-suited to handle the large volume of records within the SAcommunity open data database. This comprehensive approach not only showcased the cutting-edge methodologies in NLP but also demonstrated the project's commitment to pushing the boundaries of AI engineering.

Discover more 
Camille's Portfolio

The SAcommunity website is licensed under a
Creative Commons Attribution 3.0 Australia Licence. © Copyright