Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice input limit handling #2536

Closed
tom-doerr opened this issue Dec 5, 2024 · 6 comments
Closed

Voice input limit handling #2536

tom-doerr opened this issue Dec 5, 2024 · 6 comments
Labels
enhancement New feature or request fixed

Comments

@tom-doerr
Copy link

Issue

I just take a long time to explain all goals, features, todos, issues in detail using voice input. Took over 7 minutes and was really happy that I now gave it all relevant information, only for the transcription to fail. Would be very glad if this could be avoided, having minutes of voice input going down the drain really hurts. I managed to copy the tmp audio recording file and hope to be able to transcribe it myself, but this obviously isn't a good solution.
image

Version and model info

Aider v0.66.0
Main model: claude-3-5-sonnet-20241022 with diff edit format, prompt cache, infinite output
Weak model: claude-3-5-haiku-20241022
Git repo: .git with 5 files
Repo-map: using 1024 tokens, files refresh
Added README.md to the chat.
Added experiment_fetch_recent_stars.py to the chat.
Added scrape_github.py to the chat.
Added test_scrape_github.py to the chat.
Restored previous conversation history.
Command Line Args: --deepseek --vim --analytics --analytics-log analytics.log
--cache-prompts --max-chat-history-tokens 10000 --voice-language de

Environment Variables:
OPENAI_API_KEY: ...U_UA
ANTHROPIC_API_KEY: ...egAA
Config File (/home/tom/.aider.conf.yml):
model: sonnet
editor: vim
cache-keepalive-pings:1

Defaults:
--model-settings-file:.aider.model.settings.yml
--model-metadata-file:.aider.model.metadata.json
--env-file: /home/tom/git/github_star_scraping/.env
--map-refresh: auto
--map-multiplier-no-files:2
--input-history-file:/home/tom/git/github_star_scraping/.aider.input.history
--chat-history-file:/home/tom/git/github_star_scraping/.aider.chat.history.md
--user-input-color:#00cc00
--tool-error-color:#FF2222
--tool-warning-color:#FFA500
--assistant-output-color:#0088ff
--code-theme: default
--aiderignore: /home/tom/git/github_star_scraping/.aiderignore
--lint-cmd: []
--test-cmd: []
--encoding: utf-8
--voice-format: wav

Option settings:

  • aiderignore: /home/tom/git/github_star_scraping/.aiderignore
  • alias: None
  • analytics: True
  • analytics_disable: False
  • analytics_log: analytics.log
  • anthropic_api_key: ...egAA
  • apply: None
  • apply_clipboard_edits: False
  • assistant_output_color: #0088ff
  • attribute_author: True
  • attribute_commit_message_author: False
  • attribute_commit_message_committer: False
  • attribute_committer: True
  • auto_commits: True
  • auto_lint: True
  • auto_test: False
  • cache_keepalive_pings: 1
  • cache_prompts: True
  • chat_history_file: /home/tom/git/github_star_scraping/.aider.chat.history.md
  • chat_language: None
  • check_update: True
  • code_theme: default
  • commit: False
  • commit_prompt: None
  • completion_menu_bg_color: None
  • completion_menu_color: None
  • completion_menu_current_bg_color: None
  • completion_menu_current_color: None
  • config: None
  • dark_mode: False
  • detect_urls: True
  • dirty_commits: True
  • dry_run: False
  • edit_format: None
  • editor: vim
  • editor_edit_format: None
  • editor_model: None
  • encoding: utf-8
  • env_file: /home/tom/git/github_star_scraping/.env
  • exit: False
  • fancy_input: True
  • file: None
  • files: []
  • git: True
  • gitignore: True
  • gui: False
  • input_history_file: /home/tom/git/github_star_scraping/.aider.input.history
  • install_main_branch: False
  • just_check_update: False
  • light_mode: False
  • lint: False
  • lint_cmd: []
  • list_models: None
  • llm_history_file: None
  • load: None
  • map_multiplier_no_files: 2
  • map_refresh: files
  • map_tokens: None
  • max_chat_history_tokens: 10000
  • message: None
  • message_file: None
  • model: deepseek/deepseek-coder
  • model_metadata_file: .aider.model.metadata.json
  • model_settings_file: .aider.model.settings.yml
  • openai_api_base: None
  • openai_api_deployment_id: None
  • openai_api_key: ...U_UA
  • openai_api_type: None
  • openai_api_version: None
  • openai_organization_id: None
  • pretty: True
  • read: None
  • restore_chat_history: False
  • show_diffs: False
  • show_model_warnings: True
  • show_prompts: False
  • show_release_notes: None
  • show_repo_map: False
  • skip_sanity_check_repo: False
  • stream: True
  • subtree_only: False
  • suggest_shell_commands: True
  • test: False
  • test_cmd: []
  • timeout: None
  • tool_error_color: #FF2222
  • tool_output_color: None
  • tool_warning_color: #FFA500
  • upgrade: False
  • user_input_color: #00cc00
  • verbose: False
  • verify_ssl: True
  • vim: True
  • voice_format: wav
  • voice_input_device: None
  • voice_language: de
  • weak_model: None
  • yes_always: None
@tom-doerr
Copy link
Author

Maybe the default format should be MP3. For the 7 minutes, I have a 38M file. I just upgraded my internet connection, but uploading this file still takes 7 seconds. Before the upgrade I had 10Mbit/s uploaded (also not that slow). Then it would take around 30s just to send it. According to Perplexity, this should also avoid hitting the 25MB Whisper file size limit.
image

@paul-gauthier
Copy link
Collaborator

Thanks for trying aider and filing this issue.

Unfortunately converting to mp3 requires the user to have ffmpeg or libav. So it's tricky to make that the default.

You can certainly configure your aider with --voice-format mp3 to use it though.

@github-actions github-actions bot added the question Further information is requested label Dec 6, 2024
@turian
Copy link

turian commented Dec 14, 2024

@paul-gauthier Agreed though that it is a bit sucky when you record a voice and realize it's just slightly longer than the limit and then copy the WAV from tmp and upload it to https://replicate.com/openai/whisper

Would be nicer if there were some fallback option that allowed you to record very long voice and use it without leaving aider.

[edit: That replicate gave me nonsense strangely, and I used this instead: https://replicate.com/cjwbw/whisper]

Copy link

I'm labeling this issue as stale because it has been open for 2 weeks with no activity. If there are no additional comments, I will close it in 7 days.

Note: A bot script made these updates to the issue.

@github-actions github-actions bot added the stale label Dec 29, 2024
@paul-gauthier
Copy link
Collaborator

The main branch now checks if the wav file is too large, and tries to convert it to mp3 if so.

The change is available in the main branch. You can get it by installing the latest version from github:

aider --install-main-branch

# or...

python -m pip install --upgrade --upgrade-strategy only-if-needed git+https://github.com/Aider-AI/aider.git

If you have a chance to try it, let me know if it works better for you.

@paul-gauthier paul-gauthier added enhancement New feature or request fixed and removed question Further information is requested stale labels Jan 4, 2025
Copy link

I'm closing this enhancement request since it has been marked as 'fixed' for over 3 weeks. The requested feature should now be available in recent versions of aider.

If you find that this enhancement is still needed, please feel free to reopen this issue or create a new one.

Note: A bot script made these updates to the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed
Projects
None yet
Development

No branches or pull requests

3 participants