Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_nwis_df() function returns a tuple with dataframe and dictionary #112

Closed
nlamkey opened this issue Jan 11, 2022 · 4 comments
Closed

Comments

@nlamkey
Copy link

nlamkey commented Jan 11, 2022

  • HydroFunctions version: 0.2.1
  • Python version: 3.8.8
  • Operating System: windows

Description

I wrote some code for hydrofunctions a year ago that worked in getting some processed dataframes with hydrofunctions. I ran it again today and found that the code no longer works. The problem lies in the extract_nwis_df function. It used to return just a dataframe but now it returns a tuple with a df and a dictionary. In one instance it also returned 4 more columns than I called. This might of been a separate issue. I found a work around by using this subsettting the tuple with [0]. Is there a more elegant way to fix this workflow?

What I Did

def create_df(site, start, end):
    # YOUR CODE HERE
    """Creates a Panadas DataFrame with data
    downloaded from NWIS using hydrofucntions.
    Renames columns containing discharge and
    qualification codes informaiton to "discharge" and
    "flag", respectively. Creates a "siteName", "latitude",
    and "longitude" columns. Outputs the new dataframe.

    Parameters
    ----------
    site : str
    The stream gauge site number.

    start : str
    The start date as (YYYY-MM-DD) of time period of interest.

    end : str
    The end date as (YYYY-MM-DD) of time period of interest.

    Returns
    -------
    discharge : Pandas DataFrame
    Returns a dataframe containing date discharge, qualification
    codes, site name, and latitdue and longitude data

    """

    # Response from site
    parameterCd = ["00065", "00060"]
    resp = hf.get_nwis(site, "dv", start, end).json()


    # Extract values to a pandas dataframe
    discharge = hf.extract_nwis_df(resp)
    
    # Rename columns
    discharge.columns = ["discharge", "flag", 'stage', 'flag']

    # Create sitename column
    site_name = hf.get_nwis_property(resp, key="siteName")[0]

    discharge['siteName'] = site_name

    # Create lat and long column
    geoloc = hf.get_nwis_property(resp, key="geoLocation")[0]["geogLocation"]
    lat = geoloc["latitude"]
    long = geoloc["longitude"]
    discharge["latitude"] = lat
    discharge["longitude"] = long
    return discharge

site = ["06479215","06479438","06479500","06479525","06479770","06480000"]
start = "2018-01-01"
end = "2020-12-01"
temp_list = []

for i in site:
    df = create_df(i, start, end)
    temp_list.append(df)
    
stream_gage_df = pd.concat(temp_list)
stream_gage_df

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\NICK~1.LAM\AppData\Local\Temp/ipykernel_20488/4087259266.py in <module>
      5 
      6 for i in site:
----> 7     df = create_df(i, start, end)
      8     temp_list.append(df)
      9 

C:\Users\NICK~1.LAM\AppData\Local\Temp/ipykernel_20488/3569674067.py in create_df(site, start, end)
     36 
     37     # Rename columns
---> 38     discharge.columns = ["discharge", "flag", 'stage', 'flag']
     39 
     40     # Create sitename column

AttributeError: 'tuple' object has no attribute 'columns'

@mroberge
Copy link
Owner

Hi Nick!
Thanks for this question and for using hydrofunctions. I changed this function a few versions ago- I've been encouraging people to use the hf.NWIS interface instead. I can send you some code suggestions in the morning.

As you said, the tuple contains a data frame and a dictionary. The dictionary contains some metadata, but I don't remember if it has lat &long. One way to access just the dataframe is to do this:

discharge, meta = hf.extract_nwis_df(resp)

This line would replace the line where you extract values to a dataframe. You could use meta if you want or just ignore it. The rest of your code should work as is. I'll try it out in the morning!

@mroberge
Copy link
Owner

I see what you are trying to do here! It looks like you want to create a dataframe that is in the long format similar to R's 'tidy' format. I've been wanting to provide this functionality for my NWIS class for a while.

First, my code above works for your example.

Second, you said that sometimes you get two columns instead of four columns. This is because sometimes when you request data from a site it is only returning stage data instead of stage and discharge. I've never seen that before, so I'm curious. But this can be fixed by creating a more robust system for renaming your columns. Right now you just assume that you have four columns and you give them names. Instead, you could use the 'rename' method of dataframes to change the column names and create a mapper function. It would work like this: my_df.rename(mapper_function, axis=columns) now you just need a mapper function that takes the column string, looks to see if it is for qualifiers or data, and looks to see if it is for stage or discharge and return something appropriate.

Third, right now your function gives the same name to two different columns. Until we come up with a better renaming function, I would replace that line with something like this:

discharge.columns = ["discharge", "discharge-flag", 'stage', 'stage-flag']

@nlamkey
Copy link
Author

nlamkey commented Jan 13, 2022 via email

@mroberge
Copy link
Owner

Glad to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants