Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a utility module (util) with useful helper functions #164

Open
emiliom opened this issue Apr 8, 2019 · 5 comments
Open

Create a utility module (util) with useful helper functions #164

emiliom opened this issue Apr 8, 2019 · 5 comments

Comments

@emiliom
Copy link
Member

emiliom commented Apr 8, 2019

  • Import into a pandas DataFrame the output of an odm2api read function
  • Import into a geopandas GeoDataFrame the output of a SamplingFeature read function (though this implies adding GeoPandas to the requirements, which may be a bit heavy)
  • Database (and/or dataset) presentation summary, analogous to the xarray dataset printout structure. Focused on ODM2 core entities. Gives a first-order view of the contents of the ODM2 database
@emiliom emiliom added this to the v0.7.2 release milestone Apr 8, 2019
@aufdenkampe
Copy link
Member

@emiliom, I really like this idea, as I think it is fundamental to the purpose of having a functional and performant Python API. I think the extra requirements are a small price to pay for also getting efficient and tested I/O capabilities.

Thanks for continuing to move this critical repo forward.

@horsburgh
Copy link
Member

@emiliom - I don't have a really strong feeling about this other than I think we should be very careful about adding additional requirements and complexity. My feeling is that we never finished the core functionality and so adding additional functionality and dependencies should perhaps be secondary to firming up the foundation.

Utility functions would be nice. Is there ongoing work that's driving this?

@aufdenkampe
Copy link
Member

@horsburgh, good question. I'm also interested in hearing what is motivating this work!

I agree with the points about managing complexity and need to better develop core functionality.
I also believe that -- given that Pandas has become a core part of the standard Python computational science and data science stack -- that we should consider strong integration with Pandas and GeoPandas as core functionality. This is especially true given that one of the highest priorities we've heard from users and potential users is to improve I/O performance (including data alignment and slicing), and that is one of the main purposes/advantages of using Pandas.

@emiliom
Copy link
Member Author

emiliom commented Apr 27, 2019

For my own future reference, to be moved into new issues when I'm ready to work on this stuff.

Enhanced timeseries result values

From the WaterQualityMeasurements_RetrieveVisualize.ipynb example in the odm2api documentation.

# set the index to ValueDateTime for convenience.
tsValues = read.getResultValues(resultids=[1], lowercols=False)
tsValues.set_index('ValueDateTime', inplace=True)
tsValues.sort_index(inplace=True)

And to conveniently unpack relevant metadata, on variable names and units, use something like tsResult.VariableObj.VariableNameCV and tsResult.UnitsObj.UnitsAbbreviation.

GeoPandas GeoDataFrame

Starting point for ingesting Sites into a GeoDataFrame. From the WaterQualityMeasurements_RetrieveVisualize.ipynb example in the odm2api documentation.

import geopandas as gpd

# Get all of the SamplingFeatures from the ODM2 database that are Sites
siteFeatures = read.getSamplingFeatures(sftype='Site')

# Read Sites records into a Pandas DataFrame
# "if sf.Latitude" is used only to instantiate/read Site attributes)
df = pd.DataFrame.from_records([vars(sf) for sf in siteFeatures if sf.Latitude])

# Create a GeoPandas GeoDataFrame from Sites DataFrame
ptgeom = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = gpd.GeoDataFrame(df, geometry=ptgeom, crs={'init': 'epsg:4326'})

High-level database core summary

  • Database (and/or dataset) presentation summary, analogous to the xarray Dataset printout structure. Focused on ODM2 core entities. Gives a first-order view of the contents of the ODM2 database.
  • Can be done as one step after opening the ODM2 connection
  • Logical units: SF's, actions, methods, results, datasets, variables. These would be read and stored in memory.
  • Automatically provide summaries (counts) of each of these entities, by CV types. eg, SF types, action types, Result types.

@aufdenkampe
Copy link
Member

@emiliom, thanks for all your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants