|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
## LaURA: *La*nguage *U*nderstanding, *R*endition, and *A*ssistance |
|
LaURA is an open-source, permissively licensed chatbot trained for creative multi-turn conversation and writing. Inspired by the [Less is More for Alignment](https://arxiv.org/abs/2305.11206) paper, this preview release of LaURA is fine-tuned from the [OpenLLaMA-13B 600B token preview](https://huggingface.co/openlm-research/open_llama_13b_600bt) through [QLoRA](https://arxiv.org/abs/2305.14314) training on a small curated dataset derived from a highly prompted [Claude](https://www.anthropic.com/index/introducing-claude)-instant. |
|
|
|
### Why use LaURA? |
|
LaURA is: |
|
- **Creative:** Commercial models like GPT-4 and Claude are trained for succinct, uncontroversial, and grounded output. By contrast, LaURA is trained to write lengthy, well-written, and opinionated answers. |
|
- **Incisive and sardonic:** LaURA was trained to produce dry satire when prompted, and unlike corporate models trained on syncophantic output, is much more willing to analyze motivations critically and avoid parroting corporate PR. |
|
- **Permissively licensed:** LaURA can be used commercially or in research settings without legal encumbrances. |
|
|
|
This is an early version of LaURA and future releases will improve, but even this model demonstrates: |
|
- **Less is more for alignment:** Only 1000 examples are required for strong, creative instruction-following and multiturn conversation |
|
- **Fine-tuning is feasible even for single individuals:** This dataset and fine-tune was created in less than three weeks of work, evenings and weekends, using the wealth of existing open-source models and tools available. |
|
|
|
#### Examples (cherry-picked) |
|
Question-answering: |
|
``` |
|
Human: |
|
What are 3 albums I might enjoy if I like Tame Impala's "Currents"? |
|
|
|
Laura: |
|
Here are 3 albums you might enjoy if you like Tame Impala's "Currents": |
|
|
|
Thom Yorke - Tomorrow's Modern Boxes: If you appreciate the dreamy synth-pop and psychedelic rock sounds of "Currents", Thom Yorke's experimental album could be a good complement. It's more minimalist and ambient, using mostly subtle synth textures and plucked strings to create a moody atmosphere. There are a few catchy pop hooks, but the songs stretch out for 7-12 minutes and don't have a traditional structure. Give it a listen and see if it resonates with you. |
|
|
|
Animal Collective - Strawberry Jam: This album has a similar sound to "Currents", but is more textured and layered with acoustic instruments, found sounds, chimes and other organic textures. If you enjoy the dreamy psychedelia of "Currents", this may be a good option for expanding your musical horizons. There are some catchy pop tunes here, but it's probably best listened to in one sitting as it is quite long. |
|
|
|
Jamie XX - In Colour: This is more dance-heavy than "Currents" and might appeal to you if you enjoy the groovy synth beats and shimmering textures of "Currents". It integrates elements of house, techno, downtempo and more. Try some tracks like "Gosh", "Strings", or "I Know There's Gonna Be (Good Times)" for synth-based grooves and melodies that start to create a vibe. |
|
|
|
Human: |
|
What are some albums that lean more into the psych-rock side? |
|
|
|
Laura: |
|
Here are some albums that lean more towards the psych-rock side: |
|
|
|
Tame Impala - Lonerism: Tame Impala's debut full-length album continues their psychedelic, synth rock exploration at a grander scale with longer songs and expanded instrumentation. "Elephant" is a highlight, blending bluesy guitars, funky rhythms, layered echoes and a kind of lysergic atmosphere. Try any track really - they all blend together into an epic psych-rock journey. |
|
|
|
M83 - Saturdays=Youth: This album by M83 explores dreamy textures, synths that shimmer and twinkle, and songs that stretch out for over 6 minutes in some cases. It's more downtempo and atmospheric than Tame Impala's "Currents" and there are more pop hooks and choruses to capture your attention. Try the haunting "Midnight City" or the upbeat "Reunion" to get a sense of how their sound differs in pace and mood but similarities in texture and psychedelia. |
|
|
|
Pet Shop Boys - Yes: This album from the Pet Shop Boys continues their synth-pop sound but has a more dreamy, atmospheric quality than their early work. It's sleeker, more lush and cinematic with sounds that evoke rainstorms, glittering dance floors and fuzzy memories of past romances. Try "Love Is a Bourgeois Construct" or "Vocal" for a preview of how their sound evolves into something more ethereal over the course of this album. |
|
|
|
Beck - Sea Change: Beck's third album is definitely less upbeat, more introspective and melancholic in tone than his early work. It explores moody synth-pop with downtempo rhythms, piano ballads and even some R&B-infused tunes like "Lost Cause". Try the haunting "Wave" or the infectious "Soul of a Man" for a taste of Beck's more subdued yet still captivating style on this album. |
|
``` |
|
|
|
Analysis: |
|
``` |
|
Human: |
|
Write an incisive analysis of the impact of Bell Labs 50 years on. |
|
|
|
Laura: |
|
Here is my cynical analysis of Bell Labs for its half-century of dominance in innovation and economic prominence: |
|
|
|
First, let us examine the source of its power. Bell's corporate structure was uniquely configured to foster scientific progress through top-down management, centralized control, standardization, and vertical integration. Technology developed at Bell Labs quickly spread to other entities through license agreements and strategic partnerships. |
|
|
|
Second, the scientific ecosystem at Bell Labs was well-suited to supporting deep technological research. Each researcher had access to resources including financial backing, dedicated facilities, and technical staff support. They also shared a common vision with corporate leaders, prioritizing innovation for the sake of progress over short-term market interests. |
|
|
|
Third, Bell's centralization enabled it to make large investments in research areas with potential for breaking new ground. While competing firms focused on incremental improvements to existing products, Bell could dive fully into ambitious projects like artificial intelligence or cellular technology without worrying about return on investment. The company's size and influence gave it the ability to take risks others were not willing or able to make. |
|
|
|
Fourth, Bell Labs benefited from the virtuous cycle of prestige and success. Researchers gained recognition for breakthrough innovations, which in turn attracted more talent, funding and resources. Each breakthrough brought press coverage that further burnished the company's image as a hub of innovation. The result was an ongoing cycle of progress and profitability. |
|
|
|
On the other side, we see an economic model increasingly at odds with emerging realities. The dominance of research alone is a risky strategy in an era where startups outpace established corporations in generating new technologies and business models. Applying the Bell Labs playbook to the modern tech industry would favor centralized models of innovation over distributed networks of hackers and entrepreneurs continually challenging old paradigms. |
|
|
|
While existing companies may still adopt elements of the Bell practice, the ecosystem of startup funding, incentives for risk-taking and ability to capitalize on emergent trends will likely prevail in driving overall progress. Some combination of corporate-backed scientific research with open source software development and venture/angel investment will define successful firms of the future. But the era of centralized, vertically integrated innovation houses like Bell Labs is likely at an end. |
|
|
|
In conclusion, while incredible technological advances flowed from Bell Laboratories, the corporate structure that enabled them has been steadily supplanted by more flexible models driven by networked communities and decentralized market forces. Still, some of the principles Bell Labs embodied - deep technical expertise, unwavering commitment to progress, and a focus on innovative solutions rather than short-term profits - remain relevant for companies seeking to create real value through technology. But their model of absolute control of research now seems outdated, less suited to challenging an ever-expanding realm of possibility. |
|
|
|
Human: |
|
What are some of their historical high-profile innovations? |
|
|
|
Laura: |
|
Some well-known innovations from the history of Bell Labs include: |
|
|
|
• Transistor - The first commercial transistors were developed at Bell Labs in 1947, marking a major milestone in miniaturization and electronics. This invention laid the foundation for modern computers and electronic devices. |
|
|
|
• Digital computer - The first general-purpose digital computer, UNIVAC I, was built for the US Census Bureau at Bell Labs in 1951. It paved the way for today's supercomputers and digital control systems. |
|
|
|
• Communication satellite - The first communications satellite (Sputnik) was launched from a Bell Labs-designed rocket in 1957, demonstrating the potential of space-based internet connection and other applications. |
|
|
|
• Laser - Chemist Charles Townes developed the first laser using a prototype of an element previously discovered at Bell Labs, molybdenum. His creation laid the foundation for fiber optics and modern telecommunications. |
|
|
|
• CCD camera - Harry Ogletree pioneered the charge-coupled device (CCD), an essential component in digital cameras. The first commercially available CCD was developed at Bell Labs, opening up a new era of digital imaging. |
|
|
|
• Computer mouse - Douglas Engelbart, an engineer at Xerox PARC working on projects with funding from Bell Labs, developed the computer mouse and other key input devices that laid the foundation for personal computing as we know it. |
|
|
|
• Watson - The IBM Deep Blue supercomputer used systems of artificial intelligence (AI) to defeat Garry Kasparov in a famous chess match in 1997, demonstrating the potential applications of AI in fields like strategic thinking and advanced decision-making. Watson is named after Bell Labs founder Alexander Graham Bell. |
|
|
|
• Wireless cellular network and phone - The first wireless cellular network and mobile phone, were developed at Bell Labs in the 1970s, opening up the era of mass mobile communication. |
|
|
|
• Voice recognition - Researchers at Bell Labs developed the first voice recognition system for phones in the mid-1980s, laying the foundation for today's digital assistants and speech-enabled AI. The "Speechley" system was used by AT&T for customer service. |
|
|
|
``` |
|
|
|
### Usage |
|
These files are in GGML v3 format. You can use these files for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as: |
|
- [text-generation-webui](https://github.com/oobabooga/text-generation-webui) |
|
- [KoboldCpp](https://github.com/LostRuins/koboldcpp) |
|
- [ParisNeo/GPT4All-UI](https://github.com/ParisNeo/gpt4all-ui) |
|
- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) |
|
- [ctransformers](https://github.com/marella/ctransformers) |
|
|
|
#### How to run in llama.cpp |
|
Once you've followed the [llama.cpp](https://github.com/ggerganov/llama.cpp) instructions to compile the program, go to the "files and versions" tab on this page and download the file `laura-openllama13b-600bt-q6_K.bin`. I recommend saving it to the `/models` folder in the `llama.cpp` repo. Running this model with `llama.cpp` will require at least 10 GB of RAM or VRAM. |
|
|
|
To prompt the model, I recommend using the following command: |
|
``` |
|
./main --color -ngl 60 -m ./models/laura-openllama13b-600bt-q6_K.bin -c 2048 -n 512 --mirostat 2 -i -r "Human:" -p "This is a conversation between the helpful, cynical, and well-spoken AI chatbot Laura and a human. Laura's responses clearly and insightfully answer the human's request. Continue the chat from Laura's perspective." |
|
``` |
|
|
|
Explanation of these arguments: |
|
- `-ngl-60`: Offload layers to GPU. Leave this off if running on CPU. |
|
- `-c 2048`: Use the full 2048 token context window of the model. |
|
- `-n 512`: Allow replies to be up to 512 tokens. The model was trained to output answers 300-700 tokens long. |
|
- `-i -r "Human:"` Pause generation when "Human" shows up in the reply |
|
- `--mirostat 2`: Use a sampling algorithm which allows more creative output |
|
|
|
#### Prompt format |
|
LaURA was trained using the following instruction format, which I recommend using as the base for your prompts: |
|
``` |
|
This is a conversation between the helpful, cynical, and well-spoken AI chatbot Laura and a human. Laura's responses informatively, insightfully, and delightfully answer the human's request. Continue the chat as Laura. |
|
|
|
--- |
|
|
|
Human: |
|
human_prompt_1 |
|
|
|
Laura: |
|
laura_answer_1 |
|
|
|
Human: |
|
human_prompt_2 |
|
|
|
# ... |
|
``` |
|
The LaURA model was trained to predict both answers _and_ questions. Consider letting LaURA generate your follow-up questions as well! |
|
|
|
### License and intended use |
|
The LaURA model is available under the [Apache 2.0 license](https://snyk.io/learn/apache-license/), like the OpenLlama base model it was trained on. However, I strongly recommend against using this model in production: |
|
- The model was trained with helpful-only input, without guardrails. LaURA may produce problematic outputs. |
|
- LaURA was trained on English conversation-only data and has a strong propensity to hallucinate. It is not competitive on standard benchmarks; this is left for future work. |
|
|
|
### Model card |
|
**Architecture:** LaURA was trained as a QLoRA adapter (hence the name!) on LLaMA models and is added to all layers. I use $r=96$. This repo containes the _merged_ adapter into the OpenLLaMA model, for ease of use. |
|
|
|
**Base model:** This release is a low-rank fine-tune of the OpenLLaMA 13B preview, trained on 600B tokens. OpenLLaMA is a GPT-style decoder transformer language model pretrained on 600b tokens using Meta's [LLaMA architecture](https://arxiv.org/pdf/2302.13971.pdf). This LaURA release inherits the limitations of the base model. LaURA versions trained on the proprietary LLaMA-13B, other model sizes, and the full OpenLLAMA model are forthcoming. |
|
|
|
**Finetuning data:** LaURA is trained on a dataset of 1000 question-answer pairs in a multi-turn format. The training data is derived from a larger dataset of ~3000 question-answer pairs from conversations with Anthropic's Claude-instant model, filtered and highly edited for quality, succinctness, and comprehensiveness. This dataset will eventually be open-sourced, but currently contains PII that needs to be redacted before release. |
|
|
|
#### Training hyperparameters |
|
- **Learning rate:** Linear decay from 5e-5 |
|
- **Batch size:** 4 with gradient accumulation to 128 |
|
- **Sequence length:** 1120 |
|
- **Epochs:** 10 |
|
- **Dimension:** 96 |
|
- **Optimizer:** Adam |
|
|
|
Model loss at end of training was ~1.78, compared to ~1.63 for the same hyperparameters trained on LLaMA-13b. |
|
|
|
### Why is it called LaURA? |
|
It's an in-joke to the fine-tuning method, LoRA. |