Create a utility module (util) with useful helper functions #164

emiliom · 2019-04-08T07:46:00Z

Import into a pandas DataFrame the output of an odm2api read function
Import into a geopandas GeoDataFrame the output of a SamplingFeature read function (though this implies adding GeoPandas to the requirements, which may be a bit heavy)
Database (and/or dataset) presentation summary, analogous to the xarray dataset printout structure. Focused on ODM2 core entities. Gives a first-order view of the contents of the ODM2 database

aufdenkampe · 2019-04-08T13:37:18Z

@emiliom, I really like this idea, as I think it is fundamental to the purpose of having a functional and performant Python API. I think the extra requirements are a small price to pay for also getting efficient and tested I/O capabilities.

Thanks for continuing to move this critical repo forward.

horsburgh · 2019-04-08T17:13:00Z

@emiliom - I don't have a really strong feeling about this other than I think we should be very careful about adding additional requirements and complexity. My feeling is that we never finished the core functionality and so adding additional functionality and dependencies should perhaps be secondary to firming up the foundation.

Utility functions would be nice. Is there ongoing work that's driving this?

aufdenkampe · 2019-04-08T17:26:09Z

@horsburgh, good question. I'm also interested in hearing what is motivating this work!

I agree with the points about managing complexity and need to better develop core functionality.
I also believe that -- given that Pandas has become a core part of the standard Python computational science and data science stack -- that we should consider strong integration with Pandas and GeoPandas as core functionality. This is especially true given that one of the highest priorities we've heard from users and potential users is to improve I/O performance (including data alignment and slicing), and that is one of the main purposes/advantages of using Pandas.

emiliom · 2019-04-27T18:28:46Z

For my own future reference, to be moved into new issues when I'm ready to work on this stuff.

Enhanced timeseries result values

From the WaterQualityMeasurements_RetrieveVisualize.ipynb example in the odm2api documentation.

# set the index to ValueDateTime for convenience.
tsValues = read.getResultValues(resultids=[1], lowercols=False)
tsValues.set_index('ValueDateTime', inplace=True)
tsValues.sort_index(inplace=True)

And to conveniently unpack relevant metadata, on variable names and units, use something like tsResult.VariableObj.VariableNameCV and tsResult.UnitsObj.UnitsAbbreviation.

GeoPandas GeoDataFrame

Starting point for ingesting Sites into a GeoDataFrame. From the WaterQualityMeasurements_RetrieveVisualize.ipynb example in the odm2api documentation.

import geopandas as gpd

# Get all of the SamplingFeatures from the ODM2 database that are Sites
siteFeatures = read.getSamplingFeatures(sftype='Site')

# Read Sites records into a Pandas DataFrame
# "if sf.Latitude" is used only to instantiate/read Site attributes)
df = pd.DataFrame.from_records([vars(sf) for sf in siteFeatures if sf.Latitude])

# Create a GeoPandas GeoDataFrame from Sites DataFrame
ptgeom = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = gpd.GeoDataFrame(df, geometry=ptgeom, crs={'init': 'epsg:4326'})

High-level database core summary

Database (and/or dataset) presentation summary, analogous to the xarray Dataset printout structure. Focused on ODM2 core entities. Gives a first-order view of the contents of the ODM2 database.
Can be done as one step after opening the ODM2 connection
Logical units: SF's, actions, methods, results, datasets, variables. These would be read and stored in memory.
Automatically provide summaries (counts) of each of these entities, by CV types. eg, SF types, action types, Result types.

aufdenkampe · 2019-05-01T22:55:13Z

@emiliom, thanks for all your work on this!

emiliom added this to the v0.7.2 release milestone Apr 8, 2019

emiliom added the enhancement label Apr 8, 2019

emiliom mentioned this issue Apr 8, 2019

v0.7.2 release #165

Closed

emiliom mentioned this issue Apr 24, 2019

Utility to import into pandas dataframe output of read functions #172

Open

emiliom removed this from the v0.7.2 release milestone Apr 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a utility module (util) with useful helper functions #164

Create a utility module (util) with useful helper functions #164

emiliom commented Apr 8, 2019 •

edited

Loading

aufdenkampe commented Apr 8, 2019

horsburgh commented Apr 8, 2019

aufdenkampe commented Apr 8, 2019

emiliom commented Apr 27, 2019

aufdenkampe commented May 1, 2019

Create a utility module (util) with useful helper functions #164

Create a utility module (util) with useful helper functions #164

Comments

emiliom commented Apr 8, 2019 • edited Loading

aufdenkampe commented Apr 8, 2019

horsburgh commented Apr 8, 2019

aufdenkampe commented Apr 8, 2019

emiliom commented Apr 27, 2019

Enhanced timeseries result values

GeoPandas GeoDataFrame

High-level database core summary

aufdenkampe commented May 1, 2019

emiliom commented Apr 8, 2019 •

edited

Loading