Voice input limit handling #2536

tom-doerr · 2024-12-05T12:41:23Z

Issue

I just take a long time to explain all goals, features, todos, issues in detail using voice input. Took over 7 minutes and was really happy that I now gave it all relevant information, only for the transcription to fail. Would be very glad if this could be avoided, having minutes of voice input going down the drain really hurts. I managed to copy the tmp audio recording file and hope to be able to transcribe it myself, but this obviously isn't a good solution.

Version and model info

Aider v0.66.0
Main model: claude-3-5-sonnet-20241022 with diff edit format, prompt cache, infinite output
Weak model: claude-3-5-haiku-20241022
Git repo: .git with 5 files
Repo-map: using 1024 tokens, files refresh
Added README.md to the chat.
Added experiment_fetch_recent_stars.py to the chat.
Added scrape_github.py to the chat.
Added test_scrape_github.py to the chat.
Restored previous conversation history.
Command Line Args: --deepseek --vim --analytics --analytics-log analytics.log
--cache-prompts --max-chat-history-tokens 10000 --voice-language de

Environment Variables:
OPENAI_API_KEY: ...U_UA
ANTHROPIC_API_KEY: ...egAA
Config File (/home/tom/.aider.conf.yml):
model: sonnet
editor: vim
cache-keepalive-pings:1

Defaults:
--model-settings-file:.aider.model.settings.yml
--model-metadata-file:.aider.model.metadata.json
--env-file: /home/tom/git/github_star_scraping/.env
--map-refresh: auto
--map-multiplier-no-files:2
--input-history-file:/home/tom/git/github_star_scraping/.aider.input.history
--chat-history-file:/home/tom/git/github_star_scraping/.aider.chat.history.md
--user-input-color:#00cc00
--tool-error-color:#FF2222
--tool-warning-color:#FFA500
--assistant-output-color:#0088ff
--code-theme: default
--aiderignore: /home/tom/git/github_star_scraping/.aiderignore
--lint-cmd: []
--test-cmd: []
--encoding: utf-8
--voice-format: wav

Option settings:

aiderignore: /home/tom/git/github_star_scraping/.aiderignore
alias: None
analytics: True
analytics_disable: False
analytics_log: analytics.log
anthropic_api_key: ...egAA
apply: None
apply_clipboard_edits: False
assistant_output_color: #0088ff
attribute_author: True
attribute_commit_message_author: False
attribute_commit_message_committer: False
attribute_committer: True
auto_commits: True
auto_lint: True
auto_test: False
cache_keepalive_pings: 1
cache_prompts: True
chat_history_file: /home/tom/git/github_star_scraping/.aider.chat.history.md
chat_language: None
check_update: True
code_theme: default
commit: False
commit_prompt: None
completion_menu_bg_color: None
completion_menu_color: None
completion_menu_current_bg_color: None
completion_menu_current_color: None
config: None
dark_mode: False
detect_urls: True
dirty_commits: True
dry_run: False
edit_format: None
editor: vim
editor_edit_format: None
editor_model: None
encoding: utf-8
env_file: /home/tom/git/github_star_scraping/.env
exit: False
fancy_input: True
file: None
files: []
git: True
gitignore: True
gui: False
input_history_file: /home/tom/git/github_star_scraping/.aider.input.history
install_main_branch: False
just_check_update: False
light_mode: False
lint: False
lint_cmd: []
list_models: None
llm_history_file: None
load: None
map_multiplier_no_files: 2
map_refresh: files
map_tokens: None
max_chat_history_tokens: 10000
message: None
message_file: None
model: deepseek/deepseek-coder
model_metadata_file: .aider.model.metadata.json
model_settings_file: .aider.model.settings.yml
openai_api_base: None
openai_api_deployment_id: None
openai_api_key: ...U_UA
openai_api_type: None
openai_api_version: None
openai_organization_id: None
pretty: True
read: None
restore_chat_history: False
show_diffs: False
show_model_warnings: True
show_prompts: False
show_release_notes: None
show_repo_map: False
skip_sanity_check_repo: False
stream: True
subtree_only: False
suggest_shell_commands: True
test: False
test_cmd: []
timeout: None
tool_error_color: #FF2222
tool_output_color: None
tool_warning_color: #FFA500
upgrade: False
user_input_color: #00cc00
verbose: False
verify_ssl: True
vim: True
voice_format: wav
voice_input_device: None
voice_language: de
weak_model: None
yes_always: None

tom-doerr · 2024-12-05T12:52:21Z

Maybe the default format should be MP3. For the 7 minutes, I have a 38M file. I just upgraded my internet connection, but uploading this file still takes 7 seconds. Before the upgrade I had 10Mbit/s uploaded (also not that slow). Then it would take around 30s just to send it. According to Perplexity, this should also avoid hitting the 25MB Whisper file size limit.

paul-gauthier · 2024-12-05T15:18:34Z

Thanks for trying aider and filing this issue.

Unfortunately converting to mp3 requires the user to have ffmpeg or libav. So it's tricky to make that the default.

You can certainly configure your aider with --voice-format mp3 to use it though.

turian · 2024-12-14T16:19:14Z

@paul-gauthier Agreed though that it is a bit sucky when you record a voice and realize it's just slightly longer than the limit and then copy the WAV from tmp and upload it to https://replicate.com/openai/whisper

Would be nicer if there were some fallback option that allowed you to record very long voice and use it without leaving aider.

[edit: That replicate gave me nonsense strangely, and I used this instead: https://replicate.com/cjwbw/whisper]

github-actions · 2024-12-29T00:01:31Z

I'm labeling this issue as stale because it has been open for 2 weeks with no activity. If there are no additional comments, I will close it in 7 days.

Note: A bot script made these updates to the issue.

paul-gauthier · 2025-01-04T20:08:34Z

The main branch now checks if the wav file is too large, and tries to convert it to mp3 if so.

The change is available in the main branch. You can get it by installing the latest version from github:

aider --install-main-branch

# or...

python -m pip install --upgrade --upgrade-strategy only-if-needed git+https://github.com/Aider-AI/aider.git

If you have a chance to try it, let me know if it works better for you.

github-actions · 2025-01-26T00:01:47Z

I'm closing this enhancement request since it has been marked as 'fixed' for over 3 weeks. The requested feature should now be available in recent versions of aider.

If you find that this enhancement is still needed, please feel free to reopen this issue or create a new one.

Note: A bot script made these updates to the issue.

github-actions bot added the question Further information is requested label Dec 6, 2024

github-actions bot added the stale label Dec 29, 2024

paul-gauthier added enhancement New feature or request fixed and removed question Further information is requested stale labels Jan 4, 2025

github-actions bot closed this as completed Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice input limit handling #2536

Voice input limit handling #2536

tom-doerr commented Dec 5, 2024

tom-doerr commented Dec 5, 2024

paul-gauthier commented Dec 5, 2024

turian commented Dec 14, 2024 •

edited

Loading

github-actions bot commented Dec 29, 2024

paul-gauthier commented Jan 4, 2025

github-actions bot commented Jan 26, 2025

Voice input limit handling #2536

Voice input limit handling #2536

Comments

tom-doerr commented Dec 5, 2024

Issue

Version and model info

tom-doerr commented Dec 5, 2024

paul-gauthier commented Dec 5, 2024

turian commented Dec 14, 2024 • edited Loading

github-actions bot commented Dec 29, 2024

paul-gauthier commented Jan 4, 2025

github-actions bot commented Jan 26, 2025

turian commented Dec 14, 2024 •

edited

Loading