Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of multiple edits to the same file when LLM skips filename for repeat blocks and uses block type specifiers #2959

Closed
jhf opened this issue Jan 22, 2025 · 6 comments
Labels
question Further information is requested

Comments

@jhf
Copy link

jhf commented Jan 22, 2025

Issue

Here is an example session where there are multiple edits to the same file, but aider thinks it is editing the crystal file.
It seems that either aider should recognise that it is the same file being edited multiple times, i.e. the filename cli/src/statbus.cr is used continuously for multiple SEARCH/REPLACE blocks, or the edits should be discarded.
You see from the history that there is confusion between the file name and the contents of the block (crystal code).

The current behaviour is incorrect as it makes a partial edit and then tries to make a new file crystal.

So the essence of the problem is the LLM providing a type specifier for the SEARCH/REPLACE block and aider getting confused because of this.

Copy paste from the aider chat:

> The ::BIGINT casting is not required, just use the Int8 when reading the value.

I'll help remove the unnecessary ::BIGINT casting and update the version handling. Here are the changes:

cli/src/statbus.cr


 <<<<<<< SEARCH
           "SELECT version FROM db.migration WHERE version = $1::BIGINT",
 =======
           "SELECT version FROM db.migration WHERE version = $1",
 >>>>>>> REPLACE



 <<<<<<< SEARCH
             WHERE version #{@migrate_to ? "<= $1::BIGINT" : "IS NOT NULL"}
 =======
             WHERE version #{@migrate_to ? "<= $1" : "IS NOT NULL"}
 >>>>>>> REPLACE



 <<<<<<< SEARCH
           args = @migrate_to ? [@migrate_to.to_i64.to_s] : [] of String
 =======
           args = @migrate_to ? [@migrate_to] : [] of String
 >>>>>>> REPLACE


These changes:

 1 Remove unnecessary ::BIGINT type casting since PostgreSQL will handle the type conversion automatically
 2 Keep using Int64/Int8 for version numbers internally
 3 Maintain the same functionality while being more concise


Tokens: 23k sent, 224 received. Cost: $0.07 message, $0.48 session.

crystal
Create new file? (Y)es/(N)o [Yes]:

Relevant excerpt of .aider.chat.history.md

#### The ::BIGINT casting is not required, just use the Int8 when reading the value.  

I'll help remove the unnecessary ::BIGINT casting and update the version handling. Here are the changes:

cli/src/statbus.cr
```crystal
<<<<<<< SEARCH
          "SELECT version FROM db.migration WHERE version = $1::BIGINT",
=======
          "SELECT version FROM db.migration WHERE version = $1",
>>>>>>> REPLACE
```

```crystal
<<<<<<< SEARCH
            WHERE version #{@migrate_to ? "<= $1::BIGINT" : "IS NOT NULL"}
=======
            WHERE version #{@migrate_to ? "<= $1" : "IS NOT NULL"}
>>>>>>> REPLACE
```

```crystal
<<<<<<< SEARCH
          args = @migrate_to ? [@migrate_to.to_i64.to_s] : [] of String
=======
          args = @migrate_to ? [@migrate_to] : [] of String
>>>>>>> REPLACE
```

These changes:
1. Remove unnecessary ::BIGINT type casting since PostgreSQL will handle the type conversion automatically
2. Keep using Int64/Int8 for version numbers internally
3. Maintain the same functionality while being more concise

> Tokens: 23k sent, 224 received. Cost: $0.07 message, $0.48 session.  
> crystal  
> Create new file? (Y)es/(N)o [Yes]: n  
> Skipping edits to crystal  
> Applied edit to cli/src/statbus.cr  

Version and model info

Aider v0.72.1
Model: with diff edit format
Git repo: .git with 1,387 files
Warning: For large repos, consider using --subtree-only and .aiderignore
See: https://aider.chat/docs/faq.html#can-i-use-aider-in-a-large-mono-repo
Repo-map: using 4096 tokens, auto refresh
Warning: map-tokens > 2048 is not recommended. Too much irrelevant code can confuse LLMs.
Added cli/src/statbus.cr to the chat.
Added app/CONVENTIONS.md to the chat (read-only).
Restored previous conversation history.

@jhf jhf changed the title Incorrect handling of multiple edits to the same file Incorrect handling of multiple edits to the same file when LLM skips filename for repeat blocks and uses block type specifiers Jan 22, 2025
jhf added a commit to veridit/aider that referenced this issue Jan 22, 2025
jhf added a commit to veridit/aider that referenced this issue Jan 22, 2025
jhf added a commit to veridit/aider that referenced this issue Jan 22, 2025
@jhf
Copy link
Author

jhf commented Jan 22, 2025

I did try to reproduce it here veridit@caf184f but the tests pass, and do not apply only the first edit, and I can not find any trace of coder.io asking to include the file crystal.
I was hoping for a quick reproduction, and hence a fix, but the reproduction escapes me at the moment.

@jhf
Copy link
Author

jhf commented Jan 26, 2025

The failure to edit files happens intermittently, here is another chat log showing the problem:

#### `docker compose exec web  python manage.py shell < tmp/test.py  
#### the input device is not a TTY`  

Let me fix that. We'll create a proper Python script that can run without TTY:

tmp/test.py
```python
<<<<<<< SEARCH
=======
#!/usr/bin/env python
from transport.models import Reservations
from django.utils import timezone
import datetime
import sys

def main():
    # Get current date
    current_date = timezone.now().date()

    # Set up test date range
    start_date = current_date
    end_date = current_date + datetime.timedelta(days=3)

    # Create proper timezone-aware datetimes
    end_datetime = timezone.make_aware(datetime.datetime.combine(end_date, datetime.time(23, 59, 59)))
    start_datetime = timezone.make_aware(datetime.datetime.combine(start_date, datetime.time(0, 0, 0)))

    # Try the query
    reservations = Reservations.objects.filter(
        planned_departure_datetime__lt=end_datetime,
        planned_return_datetime__gt=start_datetime,
        reservation_cancelled=False,
        cancelled_by_driver=False,
        coordinator_approval='Approved'
    ).select_related('vehicle', 'assigned_driver', 'billing_department')

    # Print query details
    sys.stdout.write(f"\nQuery parameters:\n")
    sys.stdout.write(f"Start datetime: {start_datetime}\n")
    sys.stdout.write(f"End datetime: {end_datetime}\n")

    # Check results
    sys.stdout.write(f"\nFound {reservations.count()} reservations\n")

    # Print each reservation
    for r in reservations:
        sys.stdout.write(f"\nReservation {r.id}:\n")
        sys.stdout.write(f"Vehicle: {r.vehicle.vehicle_no}\n")
        sys.stdout.write(f"Departure: {r.planned_departure_datetime}\n")
        sys.stdout.write(f"Return: {r.planned_return_datetime}\n")
        sys.stdout.write(f"Driver: {r.assigned_driver.first_name} {r.assigned_driver.last_name}\n")
        sys.stdout.write(f"Department: {r.billing_department.name}\n")

if __name__ == '__main__':
    main()
>>>>>>> REPLACE
```

To run it:
```bash
mkdir -p tmp
chmod +x tmp/test.py
docker compose exec web python tmp/test.py
```

This version:
1. Uses sys.stdout.write instead of print
2. Wraps code in a main() function
3. Includes proper script execution check
4. Can run without TTY
5. Has executable permissions

> Tokens: 30k sent, 508 received. Cost: $0.10 message, $1.73 session.  
> python  
> Create new file? (Y)es/(N)o [Yes]: n  
> Skipping edits to python  

@paul-gauthier
Copy link
Collaborator

Thanks for trying aider and filing this issue.

Tokens: 30k sent means you are probably overwhelming the LLM. 25k+ tokens is usually too many.

https://aider.chat/docs/troubleshooting/edit-errors.html

What model are you using? You shared all the announce lines but seem to have remove the model info.

@github-actions github-actions bot added the question Further information is requested label Jan 27, 2025
jhf added a commit to veridit/aider that referenced this issue Jan 27, 2025
@jhf
Copy link
Author

jhf commented Jan 27, 2025

I've used this model when experiencing the errors:

Aider v0.72.3
Main model: claude-3-5-sonnet-20241022 with diff edit format, infinite output
Weak model: claude-3-5-haiku-20241022
Git repo: .git with 1,393 files
Warning: For large repos, consider using --subtree-only and .aiderignore
See: https://aider.chat/docs/faq.html#can-i-use-aider-in-a-large-mono-repo
Repo-map: using 4096 tokens, auto refresh
Added app/CONVENTIONS.md to the chat (read-only).

Notice that the problem doesn't seem to be malformed output from the LLM - or at least - the output I find in .aider.chat.history.md does work correctly when I put it in a test.

I made a test of the last example, using the same reply as found in the chat history, and the test passes. The test is here https://github.com/veridit/aider/blob/d61f7820db4fe7b40ca6153a8b9da85b6fda6f1d/tests/basic/test_coder.py#L612

Does aider modify the LLM output before writing it to the history? Meaning that LLM errors can not be found from in the history?

You are right, I'm at the limit of what the models handle, but the handling of the LLM response seems incorrect.

@paul-gauthier
Copy link
Collaborator

This looks like a dup of #2879. Please see the comments there for more information, and feel free to continue the discussion within that issue.

I'm going to close this issue for now. But please let me know if you think this is actually a distinct issue and I will reopen this issue.

@jhf
Copy link
Author

jhf commented Jan 27, 2025

Yes, it is clearly a duplicate of #2879 will continue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants