top of page
Search
  • Writer's pictureSenior Staff Writer

Azure Data Factory file wildcard option and storage blobs

If you've turned on the Azure Event Hubs "Capture" feature and now want to process the AVRO files that the service sent to Azure Blob Storage, you've likely discovered that one way to do this is with Azure Data Factory's Data Flows. Your data flow source is the Azure blob storage top-level container where Event Hubs is storing the AVRO files in a date/time-based structure.


While defining the ADF data flow source, the "Source options" page asks for "Wildcard paths" to the AVRO files. I searched and read several pages at docs.microsoft.com but nowhere could I find where Microsoft documented how to express a path to include all avro files in all folders in the hierarchy created by Event Hubs Capture.


What ultimately worked was a wildcard path like this: mycontainer/myeventhubname/**/*.avro. The tricky part (coming from the DOS world) was the two asterisks as part of the path. This apparently tells the ADF data flow to traverse recursively through the blob storage logical folder hierarchy.


Here's a page that provides more details about the wildcard matching (patterns) that ADF uses: Directory-based Tasks (apache.org).

1,627 views0 comments

Recent Posts

See All

Comments


Post: Blog2_Post
bottom of page