-
-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: AutoScrapeService crash with large collection #552
Comments
If you wish I can help you with this as I made a recursive function that gets data from database into a json array with limit and offset, such as 1-1000, 1001-2000 and so on till it gets all the rows. I made that function because I personally had this exact issue when query returns a very big result, the app would use too much memory and cursor would get full of course. |
@okan35 thanks for proposing to help: it is very welcome. I did implement a window based strategy as well like https://gist.github.com/courville/1ff449bbc6b9afc42a9d43eba63cae2b |
By the way I see there is a blob exception, you dont store an image or something like that in table as a blob right ? Because if you are doing that the solution below wont work for that. So basically main outcome with this function is to limit the amount of rows you get from your table, limit the amount for memory to process and the cursor will always bring your set limit or what is left in the last 1000 or whatever your limit is. https://gist.github.com/okan35/1c57e4f545a8ddabf9976e0699ba2bc4 I roughly put it here you will need to adapt to your code If I am seeing correctly, why your code is not working is because of this line;
|
I need in the code to count the total number of files to be scraped to report to the UI ( |
Then a simple select count(someIndexedColumn) from yourTable should not crash the app, you didn't say if you have a blob in table ? Do you an image etc in bytes in your table ? Is it possible to share this table with me ? So I could check it. |
Thanks for the help. No binary blob in the database just text entries. Since the database is modified during the scraping, the cursor lists files not scraped yet in To dump nova mediaDb you can click 10 times on the nova settings->software decoding, it will enable more options, one of them at the end of the settings list being media database export. You will need to pull it from internal storage located in I will try to dump the db at the start of the scrape process. BTW the crash is in the |
I want to be able to reproduce this but wil it be enough if I download main repository ? Or where would I find the query that results in crash ? I have the database by the way. I first wanted to see what that query returns on sqlite browser |
To reproduce the issue I scrapped a huge video collection on an SMB drive (I can provide the fake collection with almost 0 size videos). To build the application you need to follow the instructions here https://github.com/nova-video-player/aos-AVP, alternatively you can use a docker to build available here https://github.com/nova-video-player/aos-AVP/tree/nova/docker |
What happens if you just do a simple |
FYI, the crash happens when you have more than 20k videos to process and thus after several paged queries in the while loop. |
This will be kind of hard to reproduce for me to try as I dont have much time, but what I would do is as I said what happens if you only do 10 rows of data or 1000 just to see the crash. So we would know after certain amount of rows it crashes or a certain row has a strange data which causes crash. Maybe literally only getting some id data for all the rows would also show us that if it doesn't crash on moveToNext, it means we load too much row data in cursor which could be maybe solved by eliminating some data like columns or adding later that data etc. Or if is possible on catching exception try to print current cursor data by something like Database Utils.dumpCursortoString(), cant remember exact method name. This would also show us where it fails, maybe some api returned too much text about some movie etc. |
OK I did the catch on
Will track these later. |
If it doesn't crash with let's say 10-100 or 500 rows per query that means 1000 is too much or my guess would be an "about movie" text might be taking too much space so maybe saving compressed version of this text would solve your issue and only decompress it when it is used. As I do something like this in an api work. Compressing text is really fast and saves from much space. Also is it possible for you to share full query that is thrown in exception, I am really curious what kind of query it is. Another thing I just thought now bcoz of the log you showed even though you set window size 10. Maybe if you set window size 1 and see this cursor error output again, that definitely means the data you are pulling has toooo much text in it. This kind of trial will definitely give you an idea about what is wrong. If this is the case you will need to have somehow less text. |
OK after some digging, it seems that the following code though working in
Refs: |
use limit appendQueryParameter("limit", offset + "," + limit) on uri instead since it is processed in VideoProvider note that QUERY_ARG_LIMIT is verified to work in VideoStoreImportImpl See nova-video-player/aos-AVP#552 Thanks seppel for the support!!!
As far as I understand even if you put ContentResolver limit to 1, cursor will still be full. My theory is even whn you load 10 rows your cursor gets full that means some data in one the tables has too much data in it. I checked my exported db and saw that some text columns has 350 bytes of data in it and a couple of them will make your cursor full as far as I know. My suggestion would be to change the query just for testing. Don't select those columns that have something other than movie name and director name etc, I would suggest you to do this just to see if cursor still gets full. |
@okan35 I agree with your analysis and I will try to debug it. However, the issue with contentResolver paging had to be solved too. Now it is done and I can identify which row makes the thing blows. |
I think as far I can think of your scenario your best bet is to get id of those big text rows instead of getting them like you do now with other stuff and then use those ids at the time those big texts are used. Like user would click on one show and that is where you would run a query with id of big text to get that data and since you will only get one row of data your cursor possibly and hopefully will be fine and the operation will be very fast. |
Tonight I will dump the database containing the scanning of the SMB drive prior to the scraping process starts (deactivating all the |
That would be great, I am really curious as to where the issue is and since I also have my db I can also see with your queries what the result is. |
Query is simple:
and data are almost empty. |
I will run this in the evening but if it is almost empty this is very strange like very very strange. |
So I just ran this query without |
I ran the query on the full 20k fake video db and nothing that could really be an issue for a cursor with a windowing of 2000. |
But does that fake db have |
Yes it does it is based on scraped of 20k fake video files but with real names. Thus it corresponds to a real situation. |
I can share the full db and even the fake archive of 20k videos (2MB compressed). |
Ok. I will send you an email. |
Ok, so I just checked the row with most data which is like 305 bytes so I don't get how this can give you a cursor full problem. Now I say this because here it says cursor window size is per row 2mb, I don't think you have a 2mb row of data in that fake video db table. https://stackoverflow.com/a/45678400 What I would suggest is if you can, just dont select columns _data and title, just to see if your query breaks. |
Problem description
Encounter
SQLiteQuery: exception: Row too big to fit into CursorWindow
.Steps to reproduce the issue
Scrap a large collection.
Expected behavior
No response
Your phone/tablet/androidTV model
nvidia shield
Operating system version
Android 11
Application version and app store
6.0.xx
Additional system information
No response
Debug logs
The text was updated successfully, but these errors were encountered: