Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: AutoScrapeService crash with large collection #552

Open
courville opened this issue Jan 26, 2022 · 28 comments
Open

[Bug]: AutoScrapeService crash with large collection #552

courville opened this issue Jan 26, 2022 · 28 comments
Assignees
Labels
bug Something isn't working

Comments

@courville
Copy link
Contributor

Problem description

Encounter SQLiteQuery: exception: Row too big to fit into CursorWindow.

Steps to reproduce the issue

Scrap a large collection.

Expected behavior

No response

Your phone/tablet/androidTV model

nvidia shield

Operating system version

Android 11

Application version and app store

6.0.xx

Additional system information

No response

Debug logs

01-26 20:43:55.140 11027 11078 E SQLiteLog: (9) statement aborts at 50: [SELECT _id, poster_id, _data, COALESCE(scraper_name,title) AS name, bookmark, Archos_bookmark, Archos_lastTimePlayed, COALESCE(cover,'') AS cover, duration, _size, Archos_traktSeen, Ar
01-26 20:43:55.204  3705  3775 D WindowManager: handleComboKeys keyCode: 22, keyAction: 1
01-26 20:43:55.220 11027 11269 E SQLiteConnection: startPos 8292 > actual rows 7684
01-26 20:43:55.221 11027 11269 E SQLiteQuery: exception: Row too big to fit into CursorWindow requiredPos=10853, totalRows=7684; query: SELECT _id, _data, title, m_id, e_id, ArchosMediaScraper_type, video_online_id, e_season FROM video WHERE (ArchosMediaScraper_id=0 AND Archos_hideFil
--------- beginning of crash
01-26 20:43:55.233 11027 11269 E AndroidRuntime: FATAL EXCEPTION: Thread-22
01-26 20:43:55.233 11027 11269 E AndroidRuntime: Process: org.courville.nova, PID: 11027
01-26 20:43:55.233 11027 11269 E AndroidRuntime: android.database.sqlite.SQLiteBlobTooBigException: Row too big to fit into CursorWindow requiredPos=10853, totalRows=7684
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.sqlite.SQLiteConnection.nativeExecuteForCursorWindow(Native Method)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.sqlite.SQLiteConnection.executeForCursorWindow(SQLiteConnection.java:1001)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.sqlite.SQLiteSession.executeForCursorWindow(SQLiteSession.java:838)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.sqlite.SQLiteQuery.fillWindow(SQLiteQuery.java:62)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.sqlite.SQLiteCursor.fillWindow(SQLiteCursor.java:161)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.sqlite.SQLiteCursor.onMove(SQLiteCursor.java:131)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.AbstractCursor.moveToPosition(AbstractCursor.java:248)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.AbstractCursor.moveToNext(AbstractCursor.java:280)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.CursorWrapper.moveToNext(CursorWrapper.java:206)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at com.archos.mediaprovider.CustomCursorFactory$CustomCursor.moveToNext(CustomCursorFactory.java:82)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at android.database.CursorWrapper.moveToNext(CursorWrapper.java:206)
01-26 20:43:55.233 11027 11269 E AndroidRuntime:        at com.archos.mediascraper.AutoScrapeService$3.run(AutoScrapeService.java:387)
@courville courville added the bug Something isn't working label Jan 26, 2022
@courville courville self-assigned this Jan 26, 2022
@okan35
Copy link

okan35 commented Jan 28, 2022

If you wish I can help you with this as I made a recursive function that gets data from database into a json array with limit and offset, such as 1-1000, 1001-2000 and so on till it gets all the rows.

I made that function because I personally had this exact issue when query returns a very big result, the app would use too much memory and cursor would get full of course.

@courville
Copy link
Contributor Author

courville commented Jan 28, 2022

@okan35 thanks for proposing to help: it is very welcome. I did implement a window based strategy as well like https://gist.github.com/courville/1ff449bbc6b9afc42a9d43eba63cae2b
But it seems that it is not working. Perhaps my window is too big.
If you want to review the AutoScrapeService code and propose a solution please do.
And yes clean function for that would be indeed better....

@okan35
Copy link

okan35 commented Jan 29, 2022

By the way I see there is a blob exception, you dont store an image or something like that in table as a blob right ? Because if you are doing that the solution below wont work for that.

So basically main outcome with this function is to limit the amount of rows you get from your table, limit the amount for memory to process and the cursor will always bring your set limit or what is left in the last 1000 or whatever your limit is.

https://gist.github.com/okan35/1c57e4f545a8ddabf9976e0699ba2bc4 I roughly put it here you will need to adapt to your code

If I am seeing correctly, why your code is not working is because of this line;

Cursor cursor = getFileListCursor(PARAM_ALL, null); final int numberOfRows = cursor.getCount(); //THIS LINE cursor.close();
Reason is you basically want to get count of that huge table this will cause you issues, your cursor should not know how big is your table.

@courville
Copy link
Contributor Author

courville commented Jan 30, 2022

I need in the code to count the total number of files to be scraped to report to the UI (sTotalNumberOfFilesRemainingToProcess). This is computed once and the cursor is then closed.
I am not sure if the counting itself is the culprit since in logs the code crashes when walking the secondary cursor (while (cursor.moveToNext()) and this part of the code is using the window trick.
But since it crashes even with a smaller window I can take any review and suggestion.

@okan35
Copy link

okan35 commented Jan 30, 2022

Then a simple select count(someIndexedColumn) from yourTable should not crash the app, you didn't say if you have a blob in table ? Do you an image etc in bytes in your table ?

Is it possible to share this table with me ? So I could check it.

@courville
Copy link
Contributor Author

courville commented Jan 30, 2022

Thanks for the help. No binary blob in the database just text entries. Since the database is modified during the scraping, the cursor lists files not scraped yet in VideoStore.Video.Media.EXTERNAL_CONTENT_URI via getFileListCursor in AutoScrapeService, it is not easy to dump the database at the start (before the non scraped entries get scraped).

To dump nova mediaDb you can click 10 times on the nova settings->software decoding, it will enable more options, one of them at the end of the settings list being media database export. You will need to pull it from internal storage located in /sdcard/org.courville.nova-media.db on v5 but located in /sdcard/Android/org.courville.nova/files/org.courville.nova-media.db on v6.

I will try to dump the db at the start of the scrape process.

BTW the crash is in the cursort.moveToNext() and not in the count.

@okan35
Copy link

okan35 commented Jan 30, 2022

I want to be able to reproduce this but wil it be enough if I download main repository ? Or where would I find the query that results in crash ? I have the database by the way. I first wanted to see what that query returns on sqlite browser

@courville
Copy link
Contributor Author

To reproduce the issue I scrapped a huge video collection on an SMB drive (I can provide the fake collection with almost 0 size videos). To build the application you need to follow the instructions here https://github.com/nova-video-player/aos-AVP, alternatively you can use a docker to build available here https://github.com/nova-video-player/aos-AVP/tree/nova/docker
I will try later today to dump the db before it gets processed by the autoscraping.

@okan35
Copy link

okan35 commented Jan 30, 2022

What happens if you just do a simple SELECT count(_id) as result from EPISODE just to see if cursor.moveToNext() crashes again as this takes just 5 ms or if you run for 10 rows and see if that part crashes and if you can send me the query where it results in crash it would be helpful, I fould that autoscraperservice file by the way, so just checking that.

@courville
Copy link
Contributor Author

FYI, the crash happens when you have more than 20k videos to process and thus after several paged queries in the while loop.
I know the code is bloated and has been like this since the initial open-source release from Archos times (never took the time to refactor it).

@okan35
Copy link

okan35 commented Jan 30, 2022

This will be kind of hard to reproduce for me to try as I dont have much time, but what I would do is as I said what happens if you only do 10 rows of data or 1000 just to see the crash.

So we would know after certain amount of rows it crashes or a certain row has a strange data which causes crash.

Maybe literally only getting some id data for all the rows would also show us that if it doesn't crash on moveToNext, it means we load too much row data in cursor which could be maybe solved by eliminating some data like columns or adding later that data etc.

Or if is possible on catching exception try to print current cursor data by something like Database Utils.dumpCursortoString(), cant remember exact method name. This would also show us where it fails, maybe some api returned too much text about some movie etc.

@courville
Copy link
Contributor Author

OK I did the catch on SQLiteBlobTooBigException and implemented DatabaseUtils.dumpCursorToString(cursor).
Now running with WINDOW_SIZE=10. I will report later if it crashes, for now it is scraping the fake NAS on 20k videos.
Something is wrong because I get some sporadic:

W CursorWindow: Window is full: requested allocation 60 bytes, free space 46 bytes, window size 2097152 bytes
E SQLiteLog: (9) statement aborts at 16: [SELECT _id, poster_id, _data, COALESCE(scraper_name,title) AS name, bookmark, Archos_bookmark, Archos_lastTimePlayed, COALESCE(cover,'') AS cover, duration, _size, Archos_traktSeen, Ar

Will track these later.

@okan35
Copy link

okan35 commented Jan 30, 2022

If it doesn't crash with let's say 10-100 or 500 rows per query that means 1000 is too much or my guess would be an "about movie" text might be taking too much space so maybe saving compressed version of this text would solve your issue and only decompress it when it is used.

As I do something like this in an api work. Compressing text is really fast and saves from much space.

Also is it possible for you to share full query that is thrown in exception, I am really curious what kind of query it is.

Another thing I just thought now bcoz of the log you showed even though you set window size 10.

Maybe if you set window size 1 and see this cursor error output again, that definitely means the data you are pulling has toooo much text in it. This kind of trial will definitely give you an idea about what is wrong. If this is the case you will need to have somehow less text.

@courville
Copy link
Contributor Author

courville commented Feb 15, 2022

OK after some digging, it seems that the following code though working in VideoStoreImportImpl with VideoStoreInternal.FILES uri does not work with AutoScrapeServicewith VideoStore.Video.Media.EXTERNAL_CONTENT_URI i.e. the c.getCount() has always the full size even with ContentResolver.QUERY_ARG_LIMIT to 10 with Android 12:

String WHERE_UNSCANNED = VideoStore.Video.VideoColumns.ARCHOS_MEDIA_SCRAPER_ID + "=0 AND "+ VideoStore.Video.VideoColumns.ARCHOS_HIDE_FILE + "=0 ";
Cursor c = mContentResolver.query(VideoStore.Video.Media.EXTERNAL_CONTENT_URI, SCRAPER_ACTIVITY_COLS, WHERE_UNSCANNED, null, null);
final int numberOfRows2 = c.getCount();
int numberOfRowsRemaining2 = numberOfRows2;
c.close();
int window2 = 10;
log.debug("doScan: total cursor size " + numberOfRows2 + " to be paged with window " + window2);
// break down the scan in batch of WINDOW_SIZE in order to avoid SQLiteBlobTooBigException: Row too big to fit into CursorWindow crash
// note that the db is being modified during import
do {
    if (window2 > numberOfRows2)
        window2 = numberOfRows2;
    if (Build.VERSION.SDK_INT > Build.VERSION_CODES.Q) { // API>30 requires bundle to LIMIT
        final Bundle bundle = new Bundle();
        bundle.putString(ContentResolver.QUERY_ARG_SQL_SELECTION, WHERE_UNSCANNED);
        bundle.putStringArray(ContentResolver.QUERY_ARG_SQL_SELECTION_ARGS, null);
        bundle.putStringArray(ContentResolver.QUERY_ARG_SORT_COLUMNS, new String[]{BaseColumns._ID});
        bundle.putInt(ContentResolver.QUERY_ARG_SORT_DIRECTION, ContentResolver.QUERY_SORT_DIRECTION_ASCENDING);
        bundle.putInt(ContentResolver.QUERY_ARG_LIMIT, window2);
        bundle.putInt(ContentResolver.QUERY_ARG_OFFSET, 0);
        c = mContentResolver.query(VideoStore.Video.Media.EXTERNAL_CONTENT_URI, SCRAPER_ACTIVITY_COLS, bundle, null);
    }
    log.debug("doScan: new batch fetching window=" + window2 + " entries <=" + numberOfRows2 + " remaining " + numberOfRowsRemaining2);
    log.debug("doScan: new batch cursor has size " + c.getCount());
    // TODO process cursor
    numberOfRowsRemaining2 -= window2;
    c.close();
} while (numberOfRowsRemaining2 > 0);

Refs:

courville added a commit to nova-video-player/aos-MediaLib that referenced this issue Feb 15, 2022
use limit appendQueryParameter("limit", offset + "," + limit) on uri instead since it is processed in VideoProvider

note that QUERY_ARG_LIMIT is verified to work in VideoStoreImportImpl

See nova-video-player/aos-AVP#552

Thanks seppel for the support!!!
@okan35
Copy link

okan35 commented Feb 16, 2022

As far as I understand even if you put ContentResolver limit to 1, cursor will still be full.

My theory is even whn you load 10 rows your cursor gets full that means some data in one the tables has too much data in it. I checked my exported db and saw that some text columns has 350 bytes of data in it and a couple of them will make your cursor full as far as I know.

My suggestion would be to change the query just for testing. Don't select those columns that have something other than movie name and director name etc, I would suggest you to do this just to see if cursor still gets full.

@courville
Copy link
Contributor Author

@okan35 I agree with your analysis and I will try to debug it. However, the issue with contentResolver paging had to be solved too. Now it is done and I can identify which row makes the thing blows.

@okan35
Copy link

okan35 commented Feb 16, 2022

I think as far I can think of your scenario your best bet is to get id of those big text rows instead of getting them like you do now with other stuff and then use those ids at the time those big texts are used.

Like user would click on one show and that is where you would run a query with id of big text to get that data and since you will only get one row of data your cursor possibly and hopefully will be fine and the operation will be very fast.

@courville
Copy link
Contributor Author

Tonight I will dump the database containing the scanning of the SMB drive prior to the scraping process starts (deactivating all the onStartCommand in AutoScrapeService. I will then run the sql queries the db on my computer to see if there are too big text. I do not see where it could come from since at this point we deal with only filenames non scraped.
I will share the db and the raw queries performed.

@okan35
Copy link

okan35 commented Feb 16, 2022

That would be great, I am really curious as to where the issue is and since I also have my db I can also see with your queries what the result is.

@courville
Copy link
Contributor Author

Query is simple:

select _id, _data, title, m_id, e_id, ArchosMediaScraper_type, video_online_id, e_season
from video
where ArchosMediaScraper_id = 0 and Archos_hideFile = 0

and data are almost empty.

@okan35
Copy link

okan35 commented Feb 16, 2022

I will run this in the evening but if it is almost empty this is very strange like very very strange.

@okan35
Copy link

okan35 commented Feb 16, 2022

So I just ran this query without where ArchosMediaScraper_id = 0 and Archos_hideFile = 0 because otherwise I got nothing and the most data was from _data column. it was 200 characters so maybe couple of them together might be making your query full. I think if you try without column _data it will work or you could just remove one by one till your cursor does not get full.

@courville
Copy link
Contributor Author

I ran the query on the full 20k fake video db and nothing that could really be an issue for a cursor with a windowing of 2000.
Did a full scrape with window=10 and no crash.
I will redo it with a window=2000 to be sure.
If it works I will close the issue.

@okan35
Copy link

okan35 commented Feb 16, 2022

But does that fake db have _data column full ? Like does that column have normal data ?

@courville
Copy link
Contributor Author

Yes it does it is based on scraped of 20k fake video files but with real names. Thus it corresponds to a real situation.

@courville
Copy link
Contributor Author

I can share the full db and even the fake archive of 20k videos (2MB compressed).
Just send me a mail at software at courville dot org for me to share the link.

@okan35
Copy link

okan35 commented Feb 17, 2022

Ok. I will send you an email.

@okan35
Copy link

okan35 commented Feb 17, 2022

Ok, so I just checked the row with most data which is like 305 bytes so I don't get how this can give you a cursor full problem. Now I say this because here it says cursor window size is per row 2mb, I don't think you have a 2mb row of data in that fake video db table.

https://stackoverflow.com/a/45678400

What I would suggest is if you can, just dont select columns _data and title, just to see if your query breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants