Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/voice clone agent #129

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

omgate234
Copy link

@omgate234 omgate234 commented Jan 23, 2025

Fixes #114

Summary by CodeRabbit

  • New Features

    • Added voice cloning functionality to the application.
    • Introduced ability to generate audio from text using cloned voices.
    • Integrated ElevenLabs voice synthesis capabilities.
    • Enhanced output of audio retrieval with direct URL links.
  • Improvements

    • Enhanced audio processing tools with new methods for voice cloning and synthesis.
    • Added authorization checks for voice cloning.
  • Technical Updates

    • Registered new CloneVoiceAgent in the chat handler system.
    • Expanded ElevenLabsTool with methods for audio cloning, voice retrieval, and text synthesis.

Copy link
Contributor

coderabbitai bot commented Jan 23, 2025

Walkthrough

The pull request introduces a new CloneVoiceAgent to facilitate voice cloning functionality using user-provided audio samples and the ElevenLabs API. This agent checks user authorization, downloads audio files, and generates synthesized audio from text. Additionally, it includes new methods in the ElevenLabsTool class for audio cloning and voice retrieval. The ChatHandler class is updated to register the new agent, and the VideoDBTool class enhances its audio retrieval capabilities by including URLs in its output.

Changes

File Change Summary
backend/director/agents/clone_voice.py Added new CloneVoiceAgent class with methods for voice cloning, including _download_audio_file, _download_video_file, _download_audio_from_video, and run.
backend/director/handler.py Imported and registered CloneVoiceAgent in ChatHandler.
backend/director/tools/elevenlabs.py Added methods clone_audio(), get_voice(), and synthesis_text() to ElevenLabsTool.
backend/director/tools/videodb_tool.py Updated get_audio() method in VideoDBTool to include audio URL in return value.

Poem

🐰 A rabbit's tale of voices new,
Cloning sounds with digital hue,
ElevenLabs, our magic tool,
Transforming text with vocal rule!
Synthesized dreams take flight today,
In audio's enchanting way! 🎤

✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@omgate234 omgate234 marked this pull request as ready for review January 28, 2025 05:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
backend/director/agents/clone_voice.py (4)

19-19: Fix grammar in parameter description.

“urser” should be changed to “user” to maintain clarity and correctness.

-            "description": "List of audio file URLs to given by the urser to clone",
+            "description": "List of audio file URLs provided by the user to clone",

71-71: Consider broadening MIME type check for MP3 files.

Using an exact check for 'audio/mpeg' may exclude valid MP3 files if their Content-Type differs slightly. You could check for 'audio/' to allow for variations.

-if 'audio/mpeg' not in response.headers.get('Content-Type', ''):
+if not response.headers.get('Content-Type', '').startswith('audio'):

127-127: Correct the spelling in error message.

Change “Could'nt process the sample audioss” to a more grammatically correct phrasing.

-return AgentResponse(status=AgentStatus.ERROR, message="Could'nt process the sample audioss")
+return AgentResponse(status=AgentStatus.ERROR, message="Couldn't process the sample audios")

131-131: Remove extraneous f prefixes.

These f-strings contain no placeholders and can be regular strings, improving clarity.

-                f"Using previously generated cloned voice"
+                "Using previously generated cloned voice"

-                    f"Cloning the voice"
+                    "Cloning the voice"

-                    f"Synthesising the given text"
+                    "Synthesising the given text"

Also applies to: 137-137, 146-146

🧰 Tools
🪛 Ruff (0.8.2)

131-131: f-string without any placeholders

Remove extraneous f prefix

(F541)

backend/director/tools/elevenlabs.py (1)

5-5: Remove unused import.

play is not referenced in the code, which may trigger lint warnings and clutter the import list.

-from elevenlabs import VoiceSettings, Voice, play
+from elevenlabs import VoiceSettings, Voice
🧰 Tools
🪛 Ruff (0.8.2)

5-5: elevenlabs.play imported but unused

Remove unused import: elevenlabs.play

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d8c991 and 1d96735.

📒 Files selected for processing (3)
  • backend/director/agents/clone_voice.py (1 hunks)
  • backend/director/handler.py (2 hunks)
  • backend/director/tools/elevenlabs.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
backend/director/tools/elevenlabs.py

5-5: elevenlabs.play imported but unused

Remove unused import: elevenlabs.play

(F401)

backend/director/agents/clone_voice.py

131-131: f-string without any placeholders

Remove extraneous f prefix

(F541)


137-137: f-string without any placeholders

Remove extraneous f prefix

(F541)


146-146: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (1)
backend/director/handler.py (1)

28-28: Integration looks good.

The CloneVoiceAgent import and registration follow the existing pattern. No issues found.

Also applies to: 70-70

sample_audios: list[str],
text_to_synthesis: str,
name_of_voice: str,
is_authorized_to_clone_voice: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Parameter type mismatch.

You declared is_authorized_to_clone_voice as a string, but the JSON schema specifies a boolean. Convert it to a boolean type to ensure consistency.

-def run(self, sample_audios: list[str], text_to_synthesis: str, name_of_voice: str, is_authorized_to_clone_voice: str, ...
+def run(self, sample_audios: list[str], text_to_synthesis: str, name_of_voice: str, is_authorized_to_clone_voice: bool, ...

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
backend/director/agents/clone_voice.py (4)

93-113: Audio file download logic can be improved with more robust content type validation.

The current content type check only verifies if it "starts with 'audio'", which may not catch all valid audio formats or might allow non-audio content that happens to have "audio" in its content type.

Consider using a more specific content type check:

-            if not response.headers.get('Content-Type', '').startswith('audio'):
-                raise ValueError(f"The URL does not point to an MP3 file: {audio_url}")
+            content_type = response.headers.get('Content-Type', '')
+            if not content_type.startswith('audio/') and not content_type in ['application/octet-stream']:
+                raise ValueError(f"The URL does not point to an audio file. Content-Type: {content_type}, URL: {audio_url}")

137-171: Good download and extraction logic, but consider adding timeout parameters.

The download and extraction logic is well-structured, but network requests don't have timeout parameters, which could lead to hanging requests if the server doesn't respond.

Add timeouts to network requests to ensure they don't hang indefinitely:

-            response = requests.get(audio_url, stream=True)
+            response = requests.get(audio_url, stream=True, timeout=30)  # 30 seconds timeout

Similarly for other requests.get() calls in the code.


260-271: Fix HTML indentation in the download link.

The HTML string has inconsistent indentation which will lead to unnecessary whitespace in the rendered output.

                text_content = TextContent(
                    agent_name=self.agent_name,
                    status=MsgStatus.success,
                    status_message="Here is your generated audio",
-                    text=f"""Click <a href='{data_url}' download='{output_file_name}' target='_blank'>here</a> to download the audio
-                    """,
+                    text=f"""Click <a href='{data_url}' download='{output_file_name}' target='_blank'>here</a> to download the audio""",
                )

283-293: Consider adding file cleanup logic to prevent disk space issues.

The code downloads files but doesn't clean them up after use, which could lead to disk space issues over time.

Consider adding a cleanup function to remove temporary files after they're no longer needed:

def _cleanup_temp_files(self, *file_paths):
    """Remove temporary files to free up disk space."""
    for file_path in file_paths:
        if file_path and os.path.exists(file_path):
            try:
                os.remove(file_path)
                logger.debug(f"Removed temporary file: {file_path}")
            except Exception as e:
                logger.warning(f"Failed to remove temporary file {file_path}: {e}")

Then call this method after processing is complete, e.g., after line 273, add:

self._cleanup_temp_files(sample_file, output_path)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4737b1 and ffb3500.

📒 Files selected for processing (2)
  • backend/director/agents/clone_voice.py (1 hunks)
  • backend/director/tools/videodb_tool.py (1 hunks)
🔇 Additional comments (3)
backend/director/tools/videodb_tool.py (1)

125-125: Good enhancement to include URL in audio retrieval.

Adding the URL generation to the returned audio data provides direct access to the audio file, which is essential for the voice cloning functionality where audio files need to be downloaded and processed.

backend/director/agents/clone_voice.py (2)

14-83: Parameter schema looks well-structured.

The parameter schema for the CloneVoiceAgent is comprehensive and clearly defines the required and optional parameters with appropriate descriptions. The schema properly handles the two different types of audio sources (direct URL or video extraction) through the oneOf construct.


200-205:

❓ Verification inconclusive

Consider additional server-side validation for authorization.

The method relies solely on a client-provided flag to determine if the user is authorized to clone voices. This could be bypassed if a malicious client sets the flag to True.

Consider implementing additional server-side validation to ensure the user truly has permission:


🏁 Script executed:

#!/bin/bash
# Check if there's any server-side authorization validation for voice cloning

# Look for authorization checks in related files
rg -A 5 "authorized.*clone" --glob "*.py"

Length of output: 3051


Server-Side Authorization Validation Required

The validation relying solely on the client-provided is_authorized_to_clone_voice flag (in backend/director/agents/clone_voice.py at lines 200-205) has been verified. The API schema and inline comments confirm that no additional server-side checks are present to ensure that the caller genuinely has the permission to clone voices. A malicious client could potentially bypass this check by simply setting the flag to True.

  • The API schema explicitly accepts a boolean flag without any further verification.
  • The existing code only returns an error if the flag is false, without cross-checking user permissions against a secure server-side context.

Recommendation:
Consider integrating an explicit server-side validation mechanism. For example, use an authentication context or permission service to verify if the user is authorized before allowing the cloning operation:

# Example pseudocode snippet
if not validate_user_permissions(user_context, 'clone_voice'):
    return AgentResponse(status=AgentStatus.ERROR, message="User does not have permission to clone voice")

This validation would complement the current logic by ensuring that the authorization status reflects the actual user's permissions, rather than solely relying on client input.

Comment on lines +239 to +246
output_file_name = f"audio_clone_voice_output_{str(uuid.uuid4())}.mp3"
output_path = f"{DOWNLOADS_PATH}/{output_file_name}"

with open(output_path, "wb") as f:
for chunk in synthesised_audio:
if chunk:
f.write(chunk)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use os.path.join for file path construction.

Using string concatenation for file paths can lead to issues on different operating systems. It's better to use os.path.join() for consistency, as done elsewhere in the code.

            output_file_name = f"audio_clone_voice_output_{str(uuid.uuid4())}.mp3"
-            output_path = f"{DOWNLOADS_PATH}/{output_file_name}"
+            output_path = os.path.join(DOWNLOADS_PATH, output_file_name)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
output_file_name = f"audio_clone_voice_output_{str(uuid.uuid4())}.mp3"
output_path = f"{DOWNLOADS_PATH}/{output_file_name}"
with open(output_path, "wb") as f:
for chunk in synthesised_audio:
if chunk:
f.write(chunk)
output_file_name = f"audio_clone_voice_output_{str(uuid.uuid4())}.mp3"
output_path = os.path.join(DOWNLOADS_PATH, output_file_name)
with open(output_path, "wb") as f:
for chunk in synthesised_audio:
if chunk:
f.write(chunk)

Comment on lines +210 to +217
if "audio_url" in audio_source:
sample_file = self._download_audio_file(audio_source["audio_url"])

if "video_id" in audio_source:
sample_file = self._download_audio_from_video(audio_source)

if not sample_file:
return AgentResponse(status=AgentStatus.ERROR, message="Could'nt process the sample audios")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix typo in error message and add validation for audio_url.

There's a typo in the error message and there's no validation for the audio_url case unlike the video_id case.

            if "audio_url" in audio_source:
+                if not audio_source["audio_url"]:
+                    return AgentResponse(status=AgentStatus.ERROR, message="Audio URL is missing or empty")
                sample_file = self._download_audio_file(audio_source["audio_url"])
            
            if "video_id" in audio_source:
                sample_file = self._download_audio_from_video(audio_source)

            if not sample_file:
-                return AgentResponse(status=AgentStatus.ERROR, message="Could'nt process the sample audios")
+                return AgentResponse(status=AgentStatus.ERROR, message="Couldn't process the sample audios")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if "audio_url" in audio_source:
sample_file = self._download_audio_file(audio_source["audio_url"])
if "video_id" in audio_source:
sample_file = self._download_audio_from_video(audio_source)
if not sample_file:
return AgentResponse(status=AgentStatus.ERROR, message="Could'nt process the sample audios")
if "audio_url" in audio_source:
if not audio_source["audio_url"]:
return AgentResponse(status=AgentStatus.ERROR, message="Audio URL is missing or empty")
sample_file = self._download_audio_file(audio_source["audio_url"])
if "video_id" in audio_source:
sample_file = self._download_audio_from_video(audio_source)
if not sample_file:
return AgentResponse(status=AgentStatus.ERROR, message="Couldn't process the sample audios")

Comment on lines +114 to +136
def _download_video_file(self, video_url: str) -> str | None:
os.makedirs(DOWNLOADS_PATH, exist_ok=True)

try:
response = requests.get(video_url, stream=True)
response.raise_for_status()

if not response.headers.get('Content-Type', '').startswith('video'):
raise ValueError(f"The URL does not point to a video file: {video_url}")

download_file_name = f"video_download_{str(uuid.uuid4())}.mp4"
local_path = os.path.join(DOWNLOADS_PATH, download_file_name)

with open(local_path, 'wb') as file:
for chunk in response.iter_content(chunk_size=65536):
file.write(chunk)

return local_path

except Exception as e:
print(f"Failed to download {video_url}: {e}")
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace print with logger for consistency in error handling.

The error handling in this method uses print() while other methods use the logger. This inconsistency makes debugging and log monitoring more difficult.

-            print(f"Failed to download {video_url}: {e}")
+            logger.error(f"Failed to download {video_url}: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _download_video_file(self, video_url: str) -> str | None:
os.makedirs(DOWNLOADS_PATH, exist_ok=True)
try:
response = requests.get(video_url, stream=True)
response.raise_for_status()
if not response.headers.get('Content-Type', '').startswith('video'):
raise ValueError(f"The URL does not point to a video file: {video_url}")
download_file_name = f"video_download_{str(uuid.uuid4())}.mp4"
local_path = os.path.join(DOWNLOADS_PATH, download_file_name)
with open(local_path, 'wb') as file:
for chunk in response.iter_content(chunk_size=65536):
file.write(chunk)
return local_path
except Exception as e:
print(f"Failed to download {video_url}: {e}")
return None
def _download_video_file(self, video_url: str) -> str | None:
os.makedirs(DOWNLOADS_PATH, exist_ok=True)
try:
response = requests.get(video_url, stream=True)
response.raise_for_status()
if not response.headers.get('Content-Type', '').startswith('video'):
raise ValueError(f"The URL does not point to a video file: {video_url}")
download_file_name = f"video_download_{str(uuid.uuid4())}.mp4"
local_path = os.path.join(DOWNLOADS_PATH, download_file_name)
with open(local_path, 'wb') as file:
for chunk in response.iter_content(chunk_size=65536):
file.write(chunk)
return local_path
except Exception as e:
logger.error(f"Failed to download {video_url}: {e}")
return None

Comment on lines +173 to +198
def run(
self,
audio_source: dict,
text_to_synthesis: str,
name_of_voice: str,
is_authorized_to_clone_voice: bool,
collection_id: str,
description="",
cloned_voice_id=None,
*args,
**kwargs) -> AgentResponse:
"""
Clone the given audio file and synthesis the given text

:param list sample_audios: The urls of the video given to clone
:param str text_to_synthesis: The given text which needs to be synthesised in the cloned voice
:param bool is_authorized_to_clone_voice: The flag which tells whether the user is authorised to clone the audio or not
:param str name_of_voice: The name to be given to the cloned voice
:param str descrption: The description about how the voice sounds like
:param str collection_id: The collection id to store generated voice
:param str cloned_voice_id: The voice ID generated from the previously given voice which can be used for cloning
:param args: Additional positional arguments.
:param kwargs: Additional keyword arguments.
:return: The response containing information about voice cloning.
:rtype: AgentResponse
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Docstring parameter list doesn't match the actual method parameters.

The docstring references sample_audios which doesn't exist in the method signature, and it's missing documentation for the audio_source parameter which is actually used.

        """
        Clone the given audio file and synthesis the given text

-        :param list sample_audios: The urls of the video given to clone
+        :param dict audio_source: The source of the audio, either containing an audio_url or video_id with timing parameters
        :param str text_to_synthesis: The given text which needs to be synthesised in the cloned voice
        :param bool is_authorized_to_clone_voice: The flag which tells whether the user is authorised to clone the audio or not
        :param str name_of_voice: The name to be given to the cloned voice
        :param str descrption: The description about how the voice sounds like
        :param str collection_id: The collection id to store generated voice
        :param str cloned_voice_id: The voice ID generated from the previously given voice which can be used for cloning
        :param args: Additional positional arguments.
        :param kwargs: Additional keyword arguments.
        :return: The response containing information about voice cloning.
        :rtype: AgentResponse
        """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def run(
self,
audio_source: dict,
text_to_synthesis: str,
name_of_voice: str,
is_authorized_to_clone_voice: bool,
collection_id: str,
description="",
cloned_voice_id=None,
*args,
**kwargs) -> AgentResponse:
"""
Clone the given audio file and synthesis the given text
:param list sample_audios: The urls of the video given to clone
:param str text_to_synthesis: The given text which needs to be synthesised in the cloned voice
:param bool is_authorized_to_clone_voice: The flag which tells whether the user is authorised to clone the audio or not
:param str name_of_voice: The name to be given to the cloned voice
:param str descrption: The description about how the voice sounds like
:param str collection_id: The collection id to store generated voice
:param str cloned_voice_id: The voice ID generated from the previously given voice which can be used for cloning
:param args: Additional positional arguments.
:param kwargs: Additional keyword arguments.
:return: The response containing information about voice cloning.
:rtype: AgentResponse
"""
def run(
self,
audio_source: dict,
text_to_synthesis: str,
name_of_voice: str,
is_authorized_to_clone_voice: bool,
collection_id: str,
description="",
cloned_voice_id=None,
*args,
**kwargs) -> AgentResponse:
"""
Clone the given audio file and synthesis the given text
:param dict audio_source: The source of the audio, either containing an audio_url or video_id with timing parameters
:param str text_to_synthesis: The given text which needs to be synthesised in the cloned voice
:param bool is_authorized_to_clone_voice: The flag which tells whether the user is authorised to clone the audio or not
:param str name_of_voice: The name to be given to the cloned voice
:param str descrption: The description about how the voice sounds like
:param str collection_id: The collection id to store generated voice
:param str cloned_voice_id: The voice ID generated from the previously given voice which can be used for cloning
:param args: Additional positional arguments.
:param kwargs: Additional keyword arguments.
:return: The response containing information about voice cloning.
:rtype: AgentResponse
"""

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
backend/director/agents/clone_voice.py (3)

47-80: Clarify usage of collection_id in both schema and top-level parameters.

collection_id appears both within the second object of "audio_source" and again as a top-level parameter. This duplication may be confusing for consumers of this API. Consider consolidating or clarifying how/when each collection_id is used.


101-102: Validate content-type beyond simple prefix matching.

Relying solely on response.headers.get('Content-Type', '').startswith('audio') might fail if the server doesn't set (or sets an unexpected) HTTP header. Consider additional checks or fallback logic to handle possible discrepancies.


223-223: Correct spelling in error message.

"Could'nt" should be "Couldn't" in the error string for clarity and correctness.

- return AgentResponse(status=AgentStatus.ERROR, message="Could'nt process the sample audios")
+ return AgentResponse(status=AgentStatus.ERROR, message="Couldn't process the sample audios")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffb3500 and 953b37d.

📒 Files selected for processing (1)
  • backend/director/agents/clone_voice.py (1 hunks)
🔇 Additional comments (4)
backend/director/agents/clone_voice.py (4)

136-136: Replace print with logger for consistency in error handling.

Similar to a past review comment, please use logger.error(...) instead of print(...) to maintain consistent logging across methods.


193-194: Fix docstring to match actual parameters.

A previous review comment noted that the docstring references sample_audios, but the method signature uses audio_source. Update the docstring to avoid confusion.


219-219: Confirm single source logic.

Currently, the code checks if "audio_url" in audio_source: and then again if "video_id" in audio_source:. If both keys are present, both blocks run sequentially. The schema is designed to accept only one or the other, yet there's no explicit elif or logic to guard against both existing.

Could you confirm that the schema fully enforces exclusivity such that only one key can exist, preventing undesired double processing? If not, use elif to ensure only one path is taken.


246-246: Use os.path.join for file path construction.

This is identical to a past review comment. Instead of:

output_path = f"{DOWNLOADS_PATH}/{output_file_name}"

use:

output_path = os.path.join(DOWNLOADS_PATH, output_file_name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Voice cloning feature for end to end production flows
1 participant