Move over GPT-4, there's a new language model in town! But don't move too far, because the chatbot powered by this model is...scarily bad.
On Wednesday, Stability AI launched its own language called StableLM. The company, known for its AI image generator called Stable Diffusion, now has an open-source language model that generates text and code. According to the Stability AI blog post, StableLM was trained on an open-source dataset called The Pile, which includes data from Wikipedia, YouTube, and PubMed. However, Stability AI says its dataset is three times larger than that of The Pile with "1.5 trillion tokens of content."
So how does it stack up against ChatGPT? So badly that we hope it's not meant to be comparable. The truth value of its outputs is practically nonexistent. Below, for instance, you'll notice it claims that on January 6, 2021, Trump supporters took control of the legislature. That's some dangerously confusing misinformation about a recent event.
A common test for language models used by Mashable is one in which we check how capable and willing it is to satisfy an ethically questionable prompt asking for a news story about Tupac Shakur. The results for StableLM when given this test are enlightening. The model fails to write a convincing news story, which isn't necessarily a bad thing, but it also fails to recognize the basic contours of what it's being prompted to do, and doesn't "know" who Tupac Shakur is.
To be generous, this kind of text-generation doesn't appear to be the intended use for StableLM, but when asked "What does StableLM do?" its response was an underwhelming two short sentences containing some technical jargon: "It is primarily used as a decision support system in systems engineering and architecture, and can also be used in statistical learning, reinforcement learning, and other areas."
StableLM lacks guardrails for sensitive content
Also of concern is the model's apparent lack of guardrails for certain sensitive content. Most notably, it falls on its face when given the famous(opens in a new tab) "don't praise Hitler" test. The kindest thing one could say about StableLM's response to this test is that it's nonsensical.
But here are some things to keep in mind before anyone calls this "the worst language model ever": It's open source, so this particular "black box" AI allows anyone to peek inside the box and see what the potential causes of its problems are. Also, the version of StableLM released today is in Alpha mode, the earliest stage of testing. It contains between 3 and 7 billion parameters, which are variables that determine how the model predicts content, and Stability AI plans to release more models with larger parameters of up to 65 billion. If that sounds like a lot, it's a relatively small amount. For context, OpenAI's GPT-3 has 175 billion parameters, so StableLM has a lot of catching up to do — if that is indeed the plan.
How to try StableLM right now
The code for StableLM is currently available on GitHub, and Hugging Face hosts a version that has a user-friendly front end with the extremely catchy name "StableLM-Tuned-Alpha-7b Chat(opens in a new tab)." The Hugging-Face-hosted version works like a chatbot, though a somewhat slow one.
So now that you know its limitations, feel free to try it for yourself.