Large Language Models

In a world filled with data, language is your key to making sense of it all. It turns complex information into simple insights, answers questions in a snap, and helps you understand the emotions behind words.

Bluemist AI integrates seamlessly with Hugging Face Transformers, empowering users to accomplish various natural language processing tasks effortlessly. With this integration, you can perform document question answering, generate insightful responses to questions, summarize lengthy text for easier understanding, and analyze sentiment within text. The wrapper simplifies the utilization of powerful models, enhancing productivity and enabling effective interpretation of textual data.

class TaskModels[source]

Bases: object

Class representing a collection of tasks and their associated models. It serves as a powerful wrapper for Hugging Face models, streamlining natural language processing tasks.

It offers simplified interfaces for four key functions:

Document Question Answering
Question Answering
Summarize
Sentiment Analysis

Users can initialize an instance of the class to access these functionalities effortlessly. Bluemist AI is designed to simplify complex NLP operations, making it an invaluable tool for text analysis and understanding.

get_all_tasks()[source]

Retrieves all available tasks.

Returns:: A list of all available tasks.
Return type:: list

static get_models_for_task(task_name, limit)[source]

Retrieves the available models for a given task.

Parameters:

task_name (str) – The task for which to retrieve the models.
limit (int, optional) – The maximum number of models to retrieve

Returns:

A list of available models for the specified task.

Return type:

list

perform_task(task_name, input_data, question=None, min_length=30, max_length=130, do_sample=False, override_models=None, limit=5, evaluate_models=True)[source]

Performs the task on the given dataset, evaluate the models and returns comparison metrics

task_namestr, default=None: Supported tasks can be retrieved from the TaskModels class using the get_all_tasks method.
input_datastr: Text or information used by the model to perform specific NLP tasks.
questionstr, default=None: Specific query or question provided as input to the model for question-answering tasks. The model uses this question to find the relevant answer within the provided context.
min_length: number, default=30: The minimum length of the generated summary. Defaults to 30. The summarization model ensures that the summary is at least this length.
max_lengthnumber, default=130: The maximum length of the generated summary. Defaults to 130. The summarization model limits the summary to a maximum of this length.
do_sampleboolean, default=False: Whether to use sampling during summary generation. Defaults to False. When True, the model uses a sampling technique for token selection.
override_modelsstr or list, default=None: Provide additional models not part of the pre-configured list
limitint, default=5: Limit the number of models to be compared. Default is 5.
evaluate_modelsboolean, default=True: Determine if model comparison is requested. False will override limit as 1

Document Question Answering

Document Question Answering (DQA), also known as Document Visual Question Answering, involves leveraging multi-modal features to answer questions about document images in natural language. It combines text, word positions, and images to generate meaningful responses. An illustrative example showcases DQA balancing cost efficiency with quality customer service in response to specific queries. DQA models prove versatile, adaptable to visually-rich and non-visually-rich documents, aiding in structured document parsing and invoice information extraction.

For more details, refer https://huggingface.co/tasks/document-question-answering

Question Answering

Question Answering (QA) models provide answers to questions based on a given text, aiding in document search and automating responses to frequently asked questions. These models can generate answers either with or without context. QA models can be utilized with the HuggingFace Transformers library using the question-answering pipeline, and various task variants can be addressed.

For more details, refer https://huggingface.co/tasks/question-answering

Summarization

Summarization models are designed to create concise versions of given documents while preserving crucial information. The process involves extracting or generating shorter text while maintaining the essence of the original content. Users can benefit from this tool in various scenarios, such as summarizing research papers for efficient literature review, or condensing lengthy paragraphs for improved understanding. The integration with Hugging Face Transformers allows for effortless implementation and utilization of state-of-the-art summarization models. With a simple API call, users can summarize any given text using pre-trained models, making content processing and comprehension more efficient.

For more details, refer https://huggingface.co/tasks/summarization

Sentiment Analysis

Sentiment Analysis models facilitates the understanding of sentiments conveyed within a given piece of text. It classifies the sentiment as positive, negative, or neutral, enabling valuable insights into the emotional tone of textual content. Users can apply this tool across a range of applications, from social media monitoring to product reviews analysis, helping businesses gauge public opinion and make informed decisions. The integration seamlessly connects users to state-of-the-art sentiment analysis models, simplifying the process and providing accurate sentiment assessments with ease.

For more details, refer https://huggingface.co/blog/sentiment-analysis-python

Code Samples and API deployment

Jupyter notebook with code samples for document-question-answering, question-answering, summarization and sentiment-analysis

llm_jupyter_notebook

In [ ]:

pip install -U bluemist[complete]

In [ ]:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

In [1]:

import warnings

warnings.filterwarnings('ignore')

In [2]:

import pytesseract
print("Pytesseract version:", pytesseract.get_tesseract_version())

Pytesseract version: 5.2.0

In [3]:

from bluemist.environment import initialize
from bluemist.llm.task_models import TaskModels
task_models = TaskModels()
print(task_models.get_all_tasks())

TESSDATA_PREFIX: /home/shashank-agrawal/anaconda3/envs/bluemist-test-1/share/tessdata/
TESSDATA_PREFIX: /home/shashank-agrawal/anaconda3/envs/bluemist-test-1/share/tessdata/
['document-question-answering', 'question-answering', 'summarization', 'sentiment-analysis']

In [4]:

from bluemist.llm.wrapper import perform_task
initialize()


    ██████╗ ██╗     ██╗   ██╗███████╗███╗   ███╗██╗███████╗████████╗     █████╗ ██╗
    ██╔══██╗██║     ██║   ██║██╔════╝████╗ ████║██║██╔════╝╚══██╔══╝    ██╔══██╗██║
    ██████╔╝██║     ██║   ██║█████╗  ██╔████╔██║██║███████╗   ██║       ███████║██║
    ██╔══██╗██║     ██║   ██║██╔══╝  ██║╚██╔╝██║██║╚════██║   ██║       ██╔══██║██║
    ██████╔╝███████╗╚██████╔╝███████╗██║ ╚═╝ ██║██║███████║   ██║       ██║  ██║██║                                                                        
                        (version 0.1.3 - WordCraft)
    
Bluemist path :: /home/shashank-agrawal/anaconda3/envs/bluemist-test-1/lib/python3.9/site-packages/bluemist
System platform :: posix, Linux, 6.2.0-35-generic, linux-x86_64, ('64bit', 'ELF')

In [5]:

## Task - Document Question Answering ##

task = "document-question-answering"

image = "https://templates.invoicehome.com/invoice-template-us-neat-750px.png"

question = "What is the invoice number?"

df_document_question_answering = perform_task(task, input_data=image, question=question, limit=1)
df_document_question_answering

Model :: impira/layoutlm-document-qa

Out[5]:

	model	score	answer	start	end
0	impira/layoutlm-document-qa	0.340383	us-001	16	16

In [6]:

## Task - Question Answering ##

task = "question-answering"

input = """The Industrial Revolution, which began in the late 18th century, had a 
profound impact on society, transforming the way people lived and worked. One of 
the most significant changes brought about by the Industrial Revolution was the 
shift from agrarian economies to industrial economies. This transition resulted in 
the rapid growth of cities as people flocked to urban areas in search of employment 
in factories. The development of new machinery and technologies, such as the steam 
engine and the spinning jenny, revolutionized manufacturing and led to increased 
productivity. However, the benefits of the Industrial Revolution were not evenly 
distributed, and many workers faced harsh working conditions, long hours, and low 
wages. The social and economic consequences of this era continue to shape our world 
today."""

question = """What were the key technological innovations of the Industrial Revolution, 
and how did they impact both the economy and the lives of workers during that time?"""

df_question_answering = perform_task(task, input_data=input, question=question, limit=5)
df_question_answering

Model :: distilbert-base-uncased-distilled-squad
Model :: deepset/roberta-base-squad2
Model :: Rakib/roberta-base-on-cuad
Model :: deepset/bert-large-uncased-whole-word-masking-squad2
Model :: distilbert-base-cased-distilled-squad

Out[6]:

	model	score	start	end	answer
0	distilbert-base-uncased-distilled-squad	0.556159	480	515	steam engine and the spinning jenny
1	deepset/roberta-base-squad2	0.542350	480	515	steam engine and the spinning jenny
2	Rakib/roberta-base-on-cuad	0.004189	0	25	The Industrial Revolution
3	deepset/bert-large-uncased-whole-word-masking-...	0.322402	480	515	steam engine and the spinning jenny
4	distilbert-base-cased-distilled-squad	0.481429	476	515	the steam engine and the spinning jenny

In [7]:

## Task - Summarization ##

task = "summarization"

input = """The Industrial Revolution, which began in the late 18th century, had a 
profound impact on society, transforming the way people lived and worked. One of 
the most significant changes brought about by the Industrial Revolution was the 
shift from agrarian economies to industrial economies. This transition resulted in 
the rapid growth of cities as people flocked to urban areas in search of employment 
in factories. The development of new machinery and technologies, such as the steam 
engine and the spinning jenny, revolutionized manufacturing and led to increased 
productivity. However, the benefits of the Industrial Revolution were not evenly 
distributed, and many workers faced harsh working conditions, long hours, and low 
wages. The social and economic consequences of this era continue to shape our world 
today."""

df_summarization = perform_task(task, input_data=input, limit=2)

from pandas import option_context
with option_context('display.max_colwidth', None):
    display(df_summarization.style.set_properties(**{'text-align': 'left'}))

Model :: t5-small
Model :: t5-base

	model	summary_text
0	t5-small	the Industrial Revolution began in the late 18th century . it transformed the way people lived and worked . many workers faced harsh working conditions, long hours, and low wages .
1	t5-base	the Industrial Revolution began in the late 18th century and had a profound impact on society . the shift from agrarian economies to industrial economies led to rapid growth of cities . many workers faced harsh working conditions, long hours, and low wages .

In [8]:

## Task - Sentiment Anaysis ##

task = "sentiment-analysis"

input = """The new restaurant in town has been creating quite a buzz among food 
enthusiasts.  The elegant decor, friendly staff, and a diverse menu with a wide 
range of culinary delights have been receiving rave reviews. Diners have been 
praising the exquisite flavors and presentation of the dishes. However, there 
have also been a few complaints about the wait times during peak hours. 
Overall, it seems that most customers are delighted with their dining experience 
and are looking forward to returning for more delicious meals."""

df_sentiment_analysis = perform_task(task, input_data=input, limit=2)
df_sentiment_analysis

Model :: lxyuan/distilbert-base-multilingual-cased-sentiments-student
Model :: ProsusAI/finbert

Out[8]:

	model	label	score
0	lxyuan/distilbert-base-multilingual-cased-sent...	positive	0.534604
1	ProsusAI/finbert	positive	0.858109

In [ ]:

## Deploy as API ##

from bluemist.llm import api_wrapper
api_wrapper.start_api_server()

INFO:     Started server process [19524]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)

To test the API, open the browser and navigate to http://localhost:8000/docs

API handbook for Document Question Answering

_images/llm_document_question_answering.png

API handbook for Question Answering

API handbook for Sentiment Analysis

API handbook for Summarization