-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csvlink latlong comparator failing #80
Comments
@fgregg hopefully this is still maintained! Fantastic package and hugely helpful |
I'm getting a similar error trying to pass LatLong as a field in a CSV. Dedupe job just within a single CSV itself. Running Dedupe 1.8.1, Python 3.6, on MacOSx
|
Did anyone come up with a solution to this? I've tried storing my location column in the CSV as:
... none of which work. |
I was able to solve this problem by setting the LatLong column as a tuple containing float values rather than a string, i.e set the values in the Latlong column as (123.45 , 123.45). You can see this example in the dedupe docs |
Hi,
Attempting to link two CSV files and the latlong comparator is failing because the fields are being treated as strings.
Error:
INFO:root:taking a sample of 150000 possible pairs Traceback (most recent call last): File "/usr/local/bin/csvlink", line 11, in <module> sys.exit(launch_new_instance()) File "/usr/local/lib/python3.6/site-packages/csvdedupe/csvlink.py", line 210, in launch_new_instance d.main() File "/usr/local/lib/python3.6/site-packages/csvdedupe/csvlink.py", line 134, in main deduper.sample(nonexact_1, nonexact_2, self.sample_size) File "/usr/local/lib/python3.6/site-packages/dedupe/api.py", line 849, in sample original_length_2) File "/usr/local/lib/python3.6/site-packages/dedupe/labeler.py", line 321, in sample_product sample_size) File "/usr/local/lib/python3.6/site-packages/dedupe/labeler.py", line 67, in sample_product deque_2) File "/usr/local/lib/python3.6/site-packages/dedupe/sampling.py", line 23, in blockedSample *args)) File "/usr/local/lib/python3.6/site-packages/dedupe/sampling.py", line 122, in linkSamplePredicates yield linkSamplePredicate(subsample_size, predicate, items1, items2) File "/usr/local/lib/python3.6/site-packages/dedupe/sampling.py", line 144, in linkSamplePredicate block_keys = predicate_function(column) File "/usr/local/lib/python3.6/site-packages/dedupe/predicates.py", line 422, in latLongGridPredicate return (str([round(dim, digits) for dim in field]),) File "/usr/local/lib/python3.6/site-packages/dedupe/predicates.py", line 422, in <listcomp> return (str([round(dim, digits) for dim in field]),) TypeError: type str doesn't define __round__ method
Config:
"field_names": ["Account_Name", "Mailing_Street", "Mailing_Zip", "Mailing_Country","Mailing_City", "Mailing_State","Entity_Legal_Name","Australian_Business_Number","Geolocation"], "field_definition" : [{"field" : "Account_Name", "type" : "String"}, {"field" : "Mailing_Street", "type" : "String", "Has Missing" : true}, {"field" : "Mailing_Zip", "type" : "String", "Has Missing" : true}, {"field" : "Mailing_City", "type" : "String"}, {"field" : "Mailing_State", "type" : "String"}, {"field" : "Mailing_Country", "type" : "Exact"}, {"field" : "Entity_Legal_Name", "type" : "Exact", "Has Missing" : true}, {"field" : "Geolocation", "type" : "LatLong"}, {"field" : "Australian_Business_Number", "type" : "String", "Has Missing" : true}], "output_file": "output.csv", "skip_training": false, "training_file": "training.json", "sample_size": 150000, "recall_weight": 2 }
Data in csv looks like:
(-37.985132, 145.214008)
The text was updated successfully, but these errors were encountered: