Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kmfreyberg authored Apr 12, 2023
1 parent 0a02cdb commit 662dcaf
Showing 1 changed file with 15 additions and 11 deletions.
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# 🐶 Bark

<a href="http://www.repostatus.org/#active"><img src="http://www.repostatus.org/badges/latest/active.svg" /></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/OnusFM.svg?style=social&label=Follow%20%40OnusFM)](https://twitter.com/OnusFM)
[![](https://dcbadge.vercel.app/api/server/J2B2vsjKuE?compact=true&style=flat)](https://discord.gg/J2B2vsjKuE)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing)


[Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) | [Model Card](./model-card.md)

Bark is a transformer-based text-to-audio model created by [Suno](https://suno.ai). It can generate highly realistic multilingual speech, other audio, including music and background noise, and speaker emotions like laughing, sighing and crying. To support the community we give access to pretrained model checkpoints ready for inference.
Bark is a transformer-based text-to-audio model created by [Suno](https://suno.ai). Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.

<p align="center">
<img src="https://user-images.githubusercontent.com/5068315/230698495-cbb1ced9-c911-4c9a-941d-a1a4a1286ac6.png" width="500"></img>
Expand All @@ -33,7 +34,7 @@ Audio(audio_array, rate=SAMPLE_RATE)

### 🌎 Foreign Language

Bark supports various languages out-of-the-box and automatically determines language from input text. Code-switched text will even realistically use the same voice and add an accent.
Bark supports various languages out-of-the-box and automatically determines language from input text. When prompted with code-switched text, Bark will even attempt to employ the native accent for the respective languages in the same voice.

```python
text_prompt = """
Expand All @@ -47,7 +48,8 @@ audio_array = generate_audio(text_prompt)

### 🎶 Music

Bark can generate all types of audio, and in principle doesn't see a difference between speech and music. Sometimes it chooses to generate text as music, but you can help it out by adding notes around your lyrics.
Bark can generate all types of audio, and, in principle, doesn't see a difference between speech and music. Sometimes Bark chooses to generate text as music, but you can help it out by adding music notes around your lyrics.

```python
text_prompt = """
♪ In the jungle, the mighty jungle, the lion barks tonight ♪
Expand All @@ -59,7 +61,7 @@ audio_array = generate_audio(text_prompt)

### 👥 Speaker Prompts

You can provide certain speaker prompts such as NARRATOR, MAN, WOMAN, etc. (Note that these are not always respected, especially if a conflicting audio history prompt is given.)
You can provide certain speaker prompts such as NARRATOR, MAN, WOMAN, etc. Please note that these are not always respected, especially if a conflicting audio history prompt is given.

```python
text_prompt = """
Expand All @@ -73,7 +75,9 @@ audio_array = generate_audio(text_prompt)

### 🎤 Voice/Audio Cloning

Bark has the capability to fully clone voices as well pick up music, ambience, etc. from input clips. However, to avoid misuse of this technology we limit the audio history prompts to a limited set of Suno-provided, fully synthetic options to choose from.
Bark has the capability to fully clone voices - including tone, pitch, emotion and prosody. The model also attempts to preserve music, ambient noise, etc. from input audio. However, to mitigate misuse of this technology, we limit the audio history prompts to a limited set of Suno-provided, fully synthetic options to choose from.



```python
text_prompt = """
Expand Down Expand Up @@ -104,7 +108,7 @@ Bark has been tested and works on both CPU and GPU (`pytorch 2.0+`, CUDA 11.7 an
Running Bark requires running >100M parameter transformer models.
On modern GPUs and PyTorch nightly, Bark can generate audio in roughly realtime. On older GPUs, default colab, or CPU, inference time might be 10-100x slower.

If you don't have new hardware available or if you want to play with bigger versions of our models, you can also sign up for early access to our Studio [here](https://3os84zs17th.typeform.com/suno-studio).
If you don't have new hardware available or if you want to play with bigger versions of our models, you can also sign up for early access to our model playground [here](https://3os84zs17th.typeform.com/suno-studio).

## ⚙️ Details

Expand Down Expand Up @@ -149,14 +153,14 @@ Below is a list of some known non-speech sounds, but we are finding more every d

## 🙏 Appreciation

- [nanoGPT](https://github.com/karpathy/nanoGPT) for a dead-simple and blazing fast implementation of gpt-style models
- [nanoGPT](https://github.com/karpathy/nanoGPT) for a dead-simple and blazing fast implementation of GPT-style models
- [EnCodec](https://github.com/facebookresearch/encodec) for a state-of-the-art implementation of a fantastic audio codec
- [AudioLM](https://github.com/lucidrains/audiolm-pytorch) for very related training and inference code
- [Vall-E](https://arxiv.org/abs/2301.02111), [AudioLM](https://arxiv.org/abs/2209.03143) and many other ground-breaking papers that enabled the development of Bark

## © License

Bark is licensed under a non-commercial CC-BY 4.0 NC. The Suno models themselves may be used commercially. However, this version of Bark uses `EnCodec` as a neural codec backend, which is licensed under a [non-commercial license](https://github.com/facebookresearch/encodec/blob/main/LICENSE).
Bark is licensed under a non-commercial license: CC-BY 4.0 NC. The Suno models themselves may be used commercially. However, this version of Bark uses `EnCodec` as a neural codec backend, which is licensed under a [non-commercial license](https://github.com/facebookresearch/encodec/blob/main/LICENSE).

Please contact us at `[email protected]` if you need access to a larger version of the model and/or a version of the model you can use commercially.

Expand All @@ -167,6 +171,6 @@ Please contact us at `[email protected]` if you need access to a larger version of th

## 🎧 Suno Studio (Early Access)

We’re developing a web interface for our models, including Bark.
We’re developing a playground for our models, including Bark.

You can sign up for early access [here](https://3os84zs17th.typeform.com/suno-studio).
If you are interested, you can sign up for early access [here](https://3os84zs17th.typeform.com/suno-studio).

0 comments on commit 662dcaf

Please sign in to comment.