SLIM Models - Function Calling with Small Language Models

Generally, function-calling is a specialized capability of frontier language models, such as OpenAI GPT4.

We have adapted this concept to small language models through SLIMs (Structured Language Instruction Models), which are ‘single function’ models fine-tuned to accept three main inputs to construct a prompt:

As of June 2024, there are 18 distinct SLIM function calling models with many more on the way, for most common extraction, classification, and summarization tasks:

Models List
If you would like more information about any of the SLIM models, please check out their model card:

extract - extract custom keys - slim-extract & slim-extract-tool
summary - summarize function call - slim-summary & slim-summary-tool
xsum - title/headline function call - slim-xsum & slim-xsum-tool
ner - extract named entities - slim-ner & slim-ner-tool
sentiment - evaluate sentiment - slim-sentiment & slim-sentiment-tool
topics - generate topic - slim-topics & slim-topics-tool
sa-ner - combo model (sentiment + named entities) - slim-sa-ner & slim-sa-ner-tool
boolean - provides a yes/no output with explanation - slim-boolean & slim-boolean-tool
ratings - apply 1 (low) - 5 (high) rating - slim-ratings & slim-ratings-tool
emotions - assess emotions - slim-emotions & slim-emotions-tool
tags - auto-generate list of tags - slim-tags & slim-tags-tool
tags-3b - enhanced auto-generation tagging model - slim-tags-3b & slim-tags-3b-tool
intent - identify intent - slim-intent & slim-intent-tool
category - high-level category - slim-category & slim-category-tool
nli - assess if evidence supports conclusion - slim-nli & slim-nli-tool
sql - convert text into sql - slim-sql & slim-sql-tool

You may also want to check out these quantized ‘answer’ tools, which work well in conjunction with SLIMs for question-answer and summarization:

bling-stablelm-3b-tool - 3b quantized RAG model - bling-stablelm-3b-gguf
bling-answer-tool - 1b quantized RAG model - bling-answer-tool
dragon-yi-answer-tool - 6b quantized RAG model - dragon-yi-answer-tool
dragon-mistral-answer-tool - 7b quantized RAG model - dragon-mistral-answer-tool
dragon-llama-answer-tool - 7b quantized RAG model - dragon-llama-answer-tool

All SLIM models have a common prompting structure

Inputs: – text passage - this is the core passage or piece of text that you would like the model to assess – function - classify, extract, generate - this is handled by default by the model class, so usually does not need to be explicitly declared - but is an option for SLIMs that support more than one function – params - depends upon the model, used to configure/guide the behavior of the function call - optional for some SLIMs

Outputs: – structured python output, generally either a dictionary or list

Main objectives: – enable function calling with small, locally-running models, – simplify prompts by defining specific functions and fine-tuning the model to respond accordingly without ‘prompt magic’ – standardized outputs that can be handled programmatically as part of a multi-step workflow.

from llmware.models import ModelCatalog


def discover_slim_models():

    """ Discover a list of SLIM tools in the Model Catalog.

    -- SLIMs are available in both traditional Pytorch and quantized GGUF packages.
    -- Generally, we train/fine-tune in Pytorch and then package in 4-bit quantized GGUF for inference.
    -- By default, we designate the GGUF versions with 'tool' or 'gguf' in their names.
    -- GGUF versions are generally faster to load, faster for inference and use less memory in most environments."""

    tools = ModelCatalog().list_llm_tools()
    tool_map = ModelCatalog().get_llm_fx_mapping()

    print("\nList of SLIM model tools (GGUF) in the ModelCatalog\n")

    for i, tool in enumerate(tools):
        model_card = ModelCatalog().lookup_model_card(tool_map[tool])
        print(f"{i} - tool: {tool} - "
              f"model_name: {model_card['model_name']} - "
              f"model_family: {model_card['model_family']}")

    return 0


def hello_world_slim():

    """ SLIM models can be identified in the ModelCatalog like any llmware model.  Instead of using
    inference method, SLIM models are used with the function_call method that prepares a special prompt
    instruction, and takes optional parameters.

    This example shows a series of function calls with different SLIM models.

    Please note that the first time the models will be pulled from the llmware Huggingface repository, and will
    take a couple of minutes.  Future calls will be much faster once cached in memory locally. """

    print("\nExecuting Function Call Inferences with SLIMs\n")

    #   Sentiment Analysis

    passage1 = ("This is one of the best quarters we can remember for the industrial sector "
               "with significant growth across the board in new order volume, as well as price "
               "increases in excess of inflation.  We continue to see very strong demand, especially "
               "in Asia and Europe. Accordingly, we remain bullish on the tier 1 suppliers and would "
               "be accumulating more stock on any dips.")

    #   here are the two key lines of code
    model = ModelCatalog().load_model("slim-sentiment-tool")
    response = model.function_call(passage1)

    print("sentiment response: ", response['llm_response'])

    #  Named Entity Recognition

    passage2 = "Michael Johnson was a famous Olympic sprinter from the U.S. in the early 2000s."

    model = ModelCatalog().load_model("slim-ner-tool")
    response = model.function_call(passage2)

    print("ner response: ", response['llm_response'])

    #   Extract anything with Slim-extract

    passage3 = ("Adobe shares tumbled as much as 11% in extended trading Thursday after the design software maker "
    "issued strong fiscal first-quarter results but came up slightly short on quarterly revenue guidance. "
    "Here’s how the company did, compared with estimates from analysts polled by LSEG, formerly known as Refinitiv: "
    "Earnings per share: $4.48 adjusted vs. $4.38 expected Revenue: $5.18 billion vs. $5.14 billion expected "
    "Adobe’s revenue grew 11% year over year in the quarter, which ended March 1, according to a statement. "
    "Net income decreased to $620 million, or $1.36 per share, from $1.25 billion, or $2.71 per share, "
    "in the same quarter a year ago. During the quarter, Adobe abandoned its $20 billion acquisition of "
    "design software startup Figma after U.K. regulators found competitive concerns. The company paid "
    "Figma a $1 billion termination fee.")

    model = ModelCatalog().load_model("slim-extract-tool")
    response = model.function_call(passage3, function="extract", params=["revenue growth"])

    print("extract response: ", response['llm_response'])

    #   Generate questions with Slim-Q-Gen

    model = ModelCatalog().load_model("slim-q-gen-tiny-tool", temperature=0.2, sample=True)
    #   supported params - "question", "multiple choice", "boolean"
    response = model.function_call(passage3, params=['multiple choice'])

    print("question generation response: ", response['llm_response'])

    #   Generate topic

    model = ModelCatalog().load_model("slim-topics-tool")
    response = model.function_call(passage3)

    print("topics response: ", response['llm_response'])

    #   Generate headline summary with slim-xsum
    model = ModelCatalog().load_model("slim-xsum-tool", temperature=0.0, sample=False)
    response = model.function_call(passage3)

    print("xsum response: ", response['llm_response'])

    #   Generate boolean with optional '(explain)` in parameter
    model = ModelCatalog().load_model("slim-boolean-tool")
    response = model.function_call(passage3, params=["Did Adobe revenue increase? (explain)"])

    print("boolean response: ", response['llm_response'])

    #   Generate tags
    model = ModelCatalog().load_model("slim-tags-tool", temperature=0.0, sample=False)
    response = model.function_call(passage3)

    print("tags response: ", response['llm_response'])

    return 0


def using_logits_and_integrating_into_process():

    """ This example shows two key elements of function calling SLIM models -

    1.  Using Logit Information to indicate confidence levels, especially for classifications.
    2.  Using the structured dictionary generated for programmatic handling in a larger process.

    """

    print("\nExample: using logits and integrating into process\n")

    text_passage = ("On balance, this was an average result, with earnings in line with expectations and "
                    "no big surprises to either the positive or the negative.")

    #   two key lines (load_model + execute function_call) + additional logit_analysis step
    sentiment_model = ModelCatalog().load_model("slim-sentiment-tool", get_logits=True)
    response = sentiment_model.function_call(text_passage)
    analysis = ModelCatalog().logit_analysis(response,sentiment_model.model_card, sentiment_model.hf_tokenizer_name)

    print("sentiment response: ", response['llm_response'])

    print("\nAnalyzing response")
    for keys, values in analysis.items():
        print(f"{keys} - {values}")

    #   two key attributes of the sentiment output dictionary
    sentiment_value = response["llm_response"]["sentiment"]
    confidence_level = analysis["confidence_score"]

    #   use the sentiment classification as a 'if...then' decision point in a process
    if "positive" in sentiment_value:
        print("sentiment is positive .... will take 'positive' analysis path ...", sentiment_value)
    else:
        print("sentiment is negative .... will take 'negative' analysis path ...", sentiment_value)

    if "positive" in sentiment_value and confidence_level > 0.8:
        print("sentiment is positive with high confidence ... ", sentiment_value, confidence_level)

    return 0


if __name__ == "__main__":

    #   discovering slim models in the llmware catalog
    discover_slim_models()

    #   running function call inferences
    hello_world_slim()

    #   doing interesting stuff with the output
    using_logits_and_integrating_into_process()

Need help or have questions?

Check out the llmware videos and GitHub repository.

Reach out to us on GitHub Discussions.

About the project

Contributing

Please first discuss any change you want to make publicly, for example on GitHub via raising an issue or starting a new discussion. You can also write an email or start a discussion on our Discrod channel. Read more about becoming a contributor in the GitHub repo.

Code of conduct

We welcome everyone into the llmware community. View our Code of Conduct in our GitHub repository.

`llmware` and AI Bloks

llmware is an open source project from AI Bloks - the company behind llmware. The company offers a Software as a Service (SaaS) Retrieval Augmented Generation (RAG) service. AI Bloks was founded by Namee Oberst and Darren Oberst in Oktober 2022.

License

llmware is distributed by an Apache-2.0 license.