Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review file list adapter #154

Open
dlindhol opened this issue Sep 24, 2020 · 0 comments
Open

Review file list adapter #154

dlindhol opened this issue Sep 24, 2020 · 0 comments

Comments

@dlindhol
Copy link
Member

Given a local directory (as a "file" URI) as the "source" and a regular expression "pattern" to match desired file name/path, this adapter should crawl the file system starting at that directory and make a Sample for each matching file. Matching groups in the regular expression should map to domain variables. The range should include the "uri" text variable with the absolute "file" URI. When modeling a file list dataset (like any other granule list) the domain variable should be chosen to provide uniqueness of samples. Other variables of interest could be included in the range.

Stretch goals (could become new tickets but might influence the design):

  • Consider adding the uri variable as the first in the range. But that may complicate the mapping of matching groups to variables. (LaTiS v2 puts it after all other variables, but doesn't use domain variables appropriately to support uniqueness.)
  • Consider order implications. Crawl to preserve order so we can avoid sorting?
  • Consider relative file paths with a "baseUri" metadata property. Or hiding the baseUri altogether from the user while preserving it so a zip writer could access the files.
  • Option to include file size (only if it is defined in the model?)
  • Be smart about only reading directories that might possibly have a match. For example, if the file path has dates encoded in it, only crawl those directories that fall within the selected data range. (A hybrid approach with a granule list generator might work better for this.)

See latis' FileListAdapter and variations.

@dlindhol dlindhol changed the title Add a file list adapter Review file list adapter Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant