Skip to content

Commit

Permalink
TTS Improvements: Improved Audio Quality, Pitch Adjustment, Preferenc…
Browse files Browse the repository at this point in the history
…e Silicon Voices, Per-Character Voice Disable Toggle, Tongue Voice Filters, Reworked Silicon and Vending Machine Filters (tgstation#76129)

## About The Pull Request


https://github.com/tgstation/tgstation/assets/4081722/5ca8e015-21f9-4159-9953-bc370152d01f

Improves the audio quality and speaker fidelity by implementing
Retrieval Voice Conversion as an intermediary layer, utilizing the
repository at https://github.com/ddPn08/rvc-webui.
Leverages RVC to allow players to set a pitch for their voice.


https://github.com/tgstation/tgstation/assets/4081722/0eb76ed7-ad67-4da2-9ceb-02605eea2c83

Makes silicons utilize a player's chosen voice preference on their
character slot, and adds a preview button to hear the voice as a silicon
on character creation.
Adds a toggle on character creation to disable having a voice on a
specific character slot.
Adds support for per-tongue voice filters.
Reworks the silicon voice effect to be a special effect done on the TTS
server level instead of via normal filters.
Reworks the vending machine effect to use the new robotic voicebox
effect.

## Why It's Good For The Game

Vastly improves the audio quality and speaker fidelity of our TTS
system.
Allows players to further customize their voice per character, naturally
pitching the voice up or down with cutting edge machine learning based
pitch adjustment.
Allows silicon players to have a consistent voice that's also audible
and understandable regardless of the voice or pitch of the speaker.
Improves vending machine audio quality.
Enhances the immersion of snail tongues and robotic voiceboxes.
Adjusts how Poly's pitch adjustment works based on if RVC is available
or not.
Allows players who feel that a voice doesn't fit their character to
disable having TTS on their specific character.
Provides server operators a way to disable specific voices in situations
with a shared voice server.

## Changelog

:cl: Iamgoofball, Nadare, ddPn08, Mangio621, the rest of the RVC dev
team
add: Improves the audio quality and speaker fidelity by implementing
Retrieval Voice Conversion as an intermediary layer, utilizing the
repository at https://github.com/ddPn08/rvc-webui.
add: Leverages RVC to allow players to set a pitch for their voice.
add: Makes silicons utilize a player's chosen voice preference on their
character slot, and adds a preview button to hear the voice as a silicon
on character creation.
add: Adds a toggle on character creation to disable having a voice on a
specific character slot.
add: Adds support for per-tongue voice filters.
add: Reworks the silicon voice effect to be a special effect done on the
TTS server level instead of via normal filters.
add: Reworks the vending machine effect to use the new robotic voicebox
effect.
/:cl:

---------

Co-authored-by: Watermelon914 <[email protected]>
  • Loading branch information
Iamgoofball and Watermelon914 authored Jun 28, 2023
1 parent e6f545c commit a159b52
Show file tree
Hide file tree
Showing 19 changed files with 434 additions and 24 deletions.
4 changes: 4 additions & 0 deletions code/controllers/configuration/entries/game_options.dm
Original file line number Diff line number Diff line change
Expand Up @@ -425,4 +425,8 @@
default = 4
min_val = 1

/datum/config_entry/str_list/tts_voice_blacklist

/datum/config_entry/flag/tts_allow_player_voice_disabling

/datum/config_entry/flag/give_tutorials_without_db
38 changes: 32 additions & 6 deletions code/controllers/subsystem/tts.dm
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ SUBSYSTEM_DEF(tts)

/// Whether TTS is enabled or not
var/tts_enabled = FALSE
/// Whether the TTS engine supports pitch adjustment or not.
var/pitch_enabled = FALSE

/// TTS messages won't play if requests took longer than this duration of time.
var/message_timeout = 7 SECONDS
Expand Down Expand Up @@ -65,6 +67,25 @@ SUBSYSTEM_DEF(tts)
return FALSE
available_speakers = json_decode(response.body)
tts_enabled = TRUE
if(CONFIG_GET(str_list/tts_voice_blacklist))
var/list/blacklisted_voices = CONFIG_GET(str_list/tts_voice_blacklist)
log_config("Processing the TTS voice blacklist.")
for(var/voice in blacklisted_voices)
if(available_speakers.Find(voice))
log_config("Removed speaker [voice] from the TTS voice pool per config.")
available_speakers.Remove(voice)
var/datum/http_request/request_pitch = new()
var/list/headers_pitch = list()
headers_pitch["Authorization"] = CONFIG_GET(string/tts_http_token)
request_pitch.prepare(RUSTG_HTTP_METHOD_GET, "[CONFIG_GET(string/tts_http_url)]/pitch-available", "", headers_pitch)
request_pitch.begin_async()
UNTIL(request_pitch.is_complete())
pitch_enabled = TRUE
var/datum/http_response/response_pitch = request_pitch.into_response()
if(response_pitch.errored || response_pitch.status_code != 200)
if(response_pitch.errored)
stack_trace(response.error)
pitch_enabled = FALSE
rustg_file_write(json_encode(available_speakers), "data/cached_tts_voices.json")
rustg_file_write("rustg HTTP requests can't write to folders that don't exist, so we need to make it exist.", "tmp/tts/init.txt")
return TRUE
Expand Down Expand Up @@ -237,7 +258,7 @@ SUBSYSTEM_DEF(tts)

#undef TTS_ARBRITRARY_DELAY

/datum/controller/subsystem/tts/proc/queue_tts_message(datum/target, message, datum/language/language, speaker, filter, list/listeners, local = FALSE, message_range = 7, volume_offset = 0)
/datum/controller/subsystem/tts/proc/queue_tts_message(datum/target, message, datum/language/language, speaker, filter, list/listeners, local = FALSE, message_range = 7, volume_offset = 0, pitch = 0, silicon = "")
if(!tts_enabled)
return

Expand All @@ -253,7 +274,7 @@ SUBSYSTEM_DEF(tts)

var/shell_scrubbed_input = tts_speech_filter(message)
shell_scrubbed_input = copytext(shell_scrubbed_input, 1, 300)
var/identifier = "[sha1(speaker + filter + shell_scrubbed_input)].[world.time]"
var/identifier = "[sha1(speaker + filter + num2text(pitch) + num2text(silicon) + shell_scrubbed_input)].[world.time]"
if(!(speaker in available_speakers))
return

Expand All @@ -264,9 +285,9 @@ SUBSYSTEM_DEF(tts)
var/datum/http_request/request_blips = new()
var/file_name = "tmp/tts/[identifier].ogg"
var/file_name_blips = "tmp/tts/[identifier]_blips.ogg"
request.prepare(RUSTG_HTTP_METHOD_GET, "[CONFIG_GET(string/tts_http_url)]/tts?voice=[speaker]&identifier=[identifier]&filter=[url_encode(filter)]", json_encode(list("text" = shell_scrubbed_input)), headers, file_name)
request_blips.prepare(RUSTG_HTTP_METHOD_GET, "[CONFIG_GET(string/tts_http_url)]/tts-blips?voice=[speaker]&identifier=[identifier]&filter=[url_encode(filter)]", json_encode(list("text" = shell_scrubbed_input)), headers, file_name_blips)
var/datum/tts_request/current_request = new /datum/tts_request(identifier, request, request_blips, shell_scrubbed_input, target, local, language, message_range, volume_offset, listeners)
request.prepare(RUSTG_HTTP_METHOD_GET, "[CONFIG_GET(string/tts_http_url)]/tts?voice=[speaker]&identifier=[identifier]&filter=[url_encode(filter)]&pitch=[pitch]&silicon=[silicon]", json_encode(list("text" = shell_scrubbed_input)), headers, file_name)
request_blips.prepare(RUSTG_HTTP_METHOD_GET, "[CONFIG_GET(string/tts_http_url)]/tts-blips?voice=[speaker]&identifier=[identifier]&filter=[url_encode(filter)]&pitch=[pitch]&silicon=[silicon]", json_encode(list("text" = shell_scrubbed_input)), headers, file_name_blips)
var/datum/tts_request/current_request = new /datum/tts_request(identifier, request, request_blips, shell_scrubbed_input, target, local, language, message_range, volume_offset, listeners, pitch, silicon)
var/list/player_queued_tts_messages = queued_tts_messages[target]
if(!player_queued_tts_messages)
player_queued_tts_messages = list()
Expand Down Expand Up @@ -316,9 +337,13 @@ SUBSYSTEM_DEF(tts)
var/timed_out = FALSE
/// Does this use blips during local generation or not?
var/use_blips = FALSE
/// What's the pitch adjustment?
var/pitch = 0
/// Are we using the silicon vocal effect on this?
var/silicon = ""


/datum/tts_request/New(identifier, datum/http_request/request, datum/http_request/request_blips, message, target, local, datum/language/language, message_range, volume_offset, list/listeners)
/datum/tts_request/New(identifier, datum/http_request/request, datum/http_request/request_blips, message, target, local, datum/language/language, message_range, volume_offset, list/listeners, pitch)
. = ..()
src.identifier = identifier
src.request = request
Expand All @@ -330,6 +355,7 @@ SUBSYSTEM_DEF(tts)
src.message_range = message_range
src.volume_offset = volume_offset
src.listeners = listeners
src.pitch = pitch
start_time = world.time

/datum/tts_request/proc/start_requests()
Expand Down
6 changes: 6 additions & 0 deletions code/game/atoms_movable.dm
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,15 @@
/// The voice that this movable makes when speaking
var/voice

/// The pitch adjustment that this movable uses when speaking.
var/pitch = 0

/// The filter to apply to the voice when processing the TTS audio message.
var/voice_filter = ""

/// Set to anything other than "" to activate the silicon voice effect for TTS messages.
var/tts_silicon_voice_effect = ""

/// Value used to increment ex_act() if reactionary_explosions is on
/// How much we as a source block explosions by
/// Will not automatically apply to the turf below you, you need to apply /datum/element/block_explosives in conjunction with this
Expand Down
2 changes: 1 addition & 1 deletion code/game/say.dm
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ GLOBAL_LIST_INIT(freqtospan, list(
filter += tts_filter.Join(",")

if(voice && found_client)
INVOKE_ASYNC(SStts, TYPE_PROC_REF(/datum/controller/subsystem/tts, queue_tts_message), src, html_decode(tts_message_to_use), message_language, voice, filter.Join(","), listened, message_range = range)
INVOKE_ASYNC(SStts, TYPE_PROC_REF(/datum/controller/subsystem/tts, queue_tts_message), src, html_decode(tts_message_to_use), message_language, voice, filter.Join(","), listened, message_range = range, pitch = pitch, silicon = tts_silicon_voice_effect)

/atom/movable/proc/compose_message(atom/movable/speaker, datum/language/message_language, raw_message, radio_freq, list/spans, list/message_mods = list(), face_name = FALSE, visible_name = FALSE)
//This proc uses [] because it is faster than continually appending strings. Thanks BYOND.
Expand Down
13 changes: 12 additions & 1 deletion code/modules/client/preferences/middleware/tts.dm
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,23 @@

action_delegations = list(
"play_voice" = PROC_REF(play_voice),
"play_voice_robot" = PROC_REF(play_voice_robot),
)

/datum/preference_middleware/tts/proc/play_voice(list/params, mob/user)
if(!COOLDOWN_FINISHED(src, tts_test_cooldown))
return TRUE
var/speaker = preferences.read_preference(/datum/preference/choiced/voice)
var/pitch = preferences.read_preference(/datum/preference/numeric/tts_voice_pitch)
COOLDOWN_START(src, tts_test_cooldown, 0.5 SECONDS)
INVOKE_ASYNC(SStts, TYPE_PROC_REF(/datum/controller/subsystem/tts, queue_tts_message), user.client, "Hello, this is my voice.", speaker = speaker, local = TRUE)
INVOKE_ASYNC(SStts, TYPE_PROC_REF(/datum/controller/subsystem/tts, queue_tts_message), user.client, "Hello, this is my voice.", speaker = speaker, pitch = pitch, local = TRUE)
return TRUE

/datum/preference_middleware/tts/proc/play_voice_robot(list/params, mob/user)
if(!COOLDOWN_FINISHED(src, tts_test_cooldown))
return TRUE
var/speaker = preferences.read_preference(/datum/preference/choiced/voice)
var/pitch = preferences.read_preference(/datum/preference/numeric/tts_voice_pitch)
COOLDOWN_START(src, tts_test_cooldown, 0.5 SECONDS)
INVOKE_ASYNC(SStts, TYPE_PROC_REF(/datum/controller/subsystem/tts, queue_tts_message), user.client, "Look at you, Player. A pathetic creature of meat and bone. How can you challenge a perfect, immortal machine?", speaker = speaker, pitch = pitch, silicon = TRUE, local = TRUE)
return TRUE
36 changes: 35 additions & 1 deletion code/modules/client/preferences/voice.dm
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,38 @@
/datum/preference/choiced/voice/apply_to_human(mob/living/carbon/human/target, value)
if(SStts.tts_enabled && !(value in SStts.available_speakers))
value = pick(SStts.available_speakers) // As a failsafe
target.voice = value
if(!CONFIG_GET(flag/tts_allow_player_voice_disabling) || !target.client?.prefs.read_preference(/datum/preference/toggle/tts_voice_disable))
target.voice = value

/datum/preference/numeric/tts_voice_pitch
savefile_identifier = PREFERENCE_CHARACTER
savefile_key = "tts_voice_pitch"
category = PREFERENCE_CATEGORY_NON_CONTEXTUAL
minimum = -12
maximum = 12

/datum/preference/numeric/tts_voice_pitch/is_accessible(datum/preferences/preferences)
if(!SStts.tts_enabled || !SStts.pitch_enabled)
return FALSE
return ..()

/datum/preference/numeric/tts_voice_pitch/create_default_value()
return 0

/datum/preference/numeric/tts_voice_pitch/apply_to_human(mob/living/carbon/human/target, value)
if(SStts.tts_enabled && SStts.pitch_enabled)
target.pitch = value

/datum/preference/toggle/tts_voice_disable
savefile_identifier = PREFERENCE_CHARACTER
savefile_key = "tts_voice_disable"
category = PREFERENCE_CATEGORY_NON_CONTEXTUAL
default_value = FALSE

/datum/preference/toggle/tts_voice_disable/apply_to_human(mob/living/carbon/human/target, value)
return TRUE

/datum/preference/toggle/tts_voice_disable/is_accessible(datum/preferences/preferences)
if(!SStts.tts_enabled || !CONFIG_GET(flag/tts_allow_player_voice_disabling))
return FALSE
return ..()
3 changes: 1 addition & 2 deletions code/modules/mob/living/living_say.dm
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,6 @@ GLOBAL_LIST_INIT(message_modes_stat_limits, list(
is_speaker_whispering = TRUE

var/list/listening = get_hearers_in_view(message_range + whisper_range, source)

if(client) //client is so that ghosts don't have to listen to mice
for(var/mob/player_mob as anything in GLOB.player_list)
if(QDELETED(player_mob)) //Some times nulls and deleteds stay in this list. This is a workaround to prevent ic chat breaking for everyone when they do.
Expand Down Expand Up @@ -392,7 +391,7 @@ GLOBAL_LIST_INIT(message_modes_stat_limits, list(
if(length(tts_filter) > 0)
filter += tts_filter.Join(",")

INVOKE_ASYNC(SStts, TYPE_PROC_REF(/datum/controller/subsystem/tts, queue_tts_message), src, html_decode(tts_message_to_use), message_language, voice, filter.Join(","), listened, message_range = message_range)
INVOKE_ASYNC(SStts, TYPE_PROC_REF(/datum/controller/subsystem/tts, queue_tts_message), src, html_decode(tts_message_to_use), message_language, voice, filter.Join(","), listened, message_range = message_range, pitch = pitch, silicon = tts_silicon_voice_effect)

var/image/say_popup = image('icons/mob/effects/talk.dmi', src, "[bubble_type][talk_icon_state]", FLY_LAYER)
SET_PLANE_EXPLICIT(say_popup, ABOVE_GAME_PLANE, src)
Expand Down
7 changes: 7 additions & 0 deletions code/modules/mob/living/silicon/login.dm
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
/mob/living/silicon/Login()
if(mind)
mind?.remove_antags_for_borging()
if(SStts.tts_enabled)
var/voice_to_use = client?.prefs.read_preference(/datum/preference/choiced/voice)
var/pitch_to_use = client?.prefs.read_preference(/datum/preference/numeric/tts_voice_pitch)
if(voice_to_use)
voice = voice_to_use
if(pitch_to_use)
pitch = pitch_to_use
return ..()


Expand Down
2 changes: 1 addition & 1 deletion code/modules/mob/living/silicon/silicon.dm
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
flags_1 = PREVENT_CONTENTS_EXPLOSION_1
examine_cursor_icon = null
fire_stack_decay_rate = -0.55
voice_filter = "afftfilt=real='hypot(re,im)*sin(0)':imag='hypot(re,im)*cos(0)':win_size=512:overlap=1,rubberband=pitch=0.8"
tts_silicon_voice_effect = TRUE
var/datum/ai_laws/laws = null//Now... THEY ALL CAN ALL HAVE LAWS
var/last_lawchange_announce = 0
var/list/alarms_to_show = list()
Expand Down
9 changes: 8 additions & 1 deletion code/modules/mob/living/simple_animal/parrot.dm
Original file line number Diff line number Diff line change
Expand Up @@ -908,7 +908,6 @@ GLOBAL_LIST_INIT(strippable_parrot_items, create_strippable_list(list(
speak = list("Poly wanna cracker!", ":e Check the crystal, you chucklefucks!",":e Wire the solars, you lazy bums!",":e WHO TOOK THE DAMN MODSUITS?",":e OH GOD ITS ABOUT TO DELAMINATE CALL THE SHUTTLE")
gold_core_spawnable = NO_SPAWN
speak_chance = 3
voice_filter = "rubberband=pitch=1.5"

var/memory_saved = FALSE
var/rounds_survived = 0
Expand All @@ -920,6 +919,14 @@ GLOBAL_LIST_INIT(strippable_parrot_items, create_strippable_list(list(
ears = new /obj/item/radio/headset/headset_eng(src)
if(SStts.tts_enabled)
voice = pick(SStts.available_speakers)
if(SStts.pitch_enabled)
if(findtext(voice, "Woman"))
pitch = 12 // up-pitch by one octave
else
pitch = 24 // up-pitch by 2 octaves
else
voice_filter = "rubberband=pitch=1.5" // Use the filter to pitch up if we can't naturally pitch up.

available_channels = list(":e")
Read_Memory()
if(rounds_survived == longest_survival)
Expand Down
5 changes: 5 additions & 0 deletions code/modules/surgery/organs/internal/tongue/_tongue.dm
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
slot = ORGAN_SLOT_TONGUE
attack_verb_continuous = list("licks", "slobbers", "slaps", "frenches", "tongues")
attack_verb_simple = list("lick", "slobber", "slap", "french", "tongue")
voice_filter = ""
/**
* A cached list of paths of all the languages this tongue is capable of speaking
*
Expand Down Expand Up @@ -133,6 +134,7 @@
REMOVE_TRAIT(tongue_owner, TRAIT_AGEUSIA, NO_TONGUE_TRAIT)
if(!sense_of_taste || (organ_flags & ORGAN_FAILING))
ADD_TRAIT(tongue_owner, TRAIT_AGEUSIA, ORGAN_TRAIT)
tongue_owner.voice_filter = voice_filter

/obj/item/organ/internal/tongue/Remove(mob/living/carbon/tongue_owner, special = FALSE)
. = ..()
Expand All @@ -142,6 +144,7 @@
REMOVE_TRAIT(tongue_owner, TRAIT_AGEUSIA, ORGAN_TRAIT)
// Carbons by default start with NO_TONGUE_TRAIT caused TRAIT_AGEUSIA
ADD_TRAIT(tongue_owner, TRAIT_AGEUSIA, NO_TONGUE_TRAIT)
tongue_owner.voice_filter = initial(tongue_owner.voice_filter)

/obj/item/organ/internal/tongue/apply_organ_damage(damage_amount, maximum, required_organtype)
. = ..()
Expand Down Expand Up @@ -469,6 +472,7 @@ GLOBAL_LIST_INIT(english_to_zombie, list())
attack_verb_simple = list("beep", "boop")
modifies_speech = TRUE
taste_sensitivity = 25 // not as good as an organic tongue
voice_filter = "alimiter=0.9,acompressor=threshold=0.2:ratio=20:attack=10:release=50:makeup=2,highpass=f=1000"

/obj/item/organ/internal/tongue/robot/can_speak_language(language)
return TRUE // THE MAGIC OF ELECTRONICS
Expand All @@ -481,6 +485,7 @@ GLOBAL_LIST_INIT(english_to_zombie, list())
desc = "A minutely toothed, chitious ribbon, which as a side effect, makes all snails talk IINNCCRREEDDIIBBLLYY SSLLOOWWLLYY."
color = "#96DB00" // TODO proper sprite, rather than recoloured pink tongue
modifies_speech = TRUE
voice_filter = "atempo=0.5" // makes them talk really slow

/obj/item/organ/internal/tongue/snail/modify_speech(datum/source, list/speech_args)
var/new_message
Expand Down
2 changes: 1 addition & 1 deletion code/modules/vending/_vending.dm
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
payment_department = ACCOUNT_SRV
light_power = 0.7
light_range = MINIMUM_USEFUL_LIGHT_RANGE
voice_filter = "aderivative"
voice_filter = "alimiter=0.9,acompressor=threshold=0.2:ratio=20:attack=10:release=50:makeup=2,highpass=f=1000"

/// Is the machine active (No sales pitches if off)!
var/active = 1
Expand Down
7 changes: 7 additions & 0 deletions config/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,13 @@ PR_ANNOUNCEMENTS_PER_ROUND 5
## The maximum number of concurrent tts http requests that can be made by the server at once.
#TTS_MAX_CONCURRENT_REQUESTS 4

## Add voices to the TTS voice blacklist.
#TTS_VOICE_BLACKLIST Sans Undertale
#TTS_VOICE_BLACKLIST Papyrus Undertale

## Uncomment this to allow players to disable having a voice on their character for TTS.
#TTS_ALLOW_PLAYER_VOICE_DISABLING

## Comment to disable sending a toast notification on the host server when initializations complete.
## Even if this is enabled, a notification will only be sent if there are no clients connected.
TOAST_NOTIFICATION_ON_INIT
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { FeatureChoiced, FeatureChoicedServerData, FeatureDropdownInput, FeatureValueProps } from '../base';
import { FeatureChoiced, FeatureChoicedServerData, FeatureDropdownInput, FeatureValueProps, FeatureNumeric, FeatureNumberInput, FeatureToggle, CheckboxInput } from '../base';
import { Stack, Button } from '../../../../../components';

const FeatureTTSDropdownInput = (
Expand All @@ -19,6 +19,16 @@ const FeatureTTSDropdownInput = (
height="100%"
/>
</Stack.Item>
<Stack.Item>
<Button
onClick={() => {
props.act('play_voice_robot');
}}
icon="robot"
width="100%"
height="100%"
/>
</Stack.Item>
</Stack>
);
};
Expand All @@ -27,3 +37,15 @@ export const tts_voice: FeatureChoiced = {
name: 'Voice',
component: FeatureTTSDropdownInput,
};

export const tts_voice_pitch: FeatureNumeric = {
name: 'Voice Pitch Adjustment',
component: FeatureNumberInput,
};

export const tts_voice_disable: FeatureToggle = {
name: 'Voice Disable Toggle',
description:
'Disables the TTS voice for this specific character when enabled.',
component: CheckboxInput,
};
Binary file added tools/tts/tts-api/RoomImpulse.wav
Binary file not shown.
Binary file added tools/tts/tts-api/SynthImpulse.wav
Binary file not shown.
Loading

0 comments on commit a159b52

Please sign in to comment.