As a follow up to my previous post:
Configuring a Logic App to run an Azure AI Search Service Indexer to index new documents for RAG
Configuring a Logic App to run an Azure AI Search Service Indexer to index new documents for RAG
I had demonstrated in the Logic App with actions that would check the status of an AI Search Service Indexer to determine whether it was running in order to decide whether a file that had just recently been submitted has been indexed. The design was based on some assumptions on how files were uploaded and I acknowledged that this method likely isn’t the best to handle scenarios where multiple files may be uploaded.
With the copying of the blob from source to target configured, the next step is to trigger the AI Search Service Indexer to start indexing the storage account so the new document can be indexed. Depending on the purpose of the documents, the indexer may be configured to run on a hourly, daily, weekly, or some other schedule that has been communicated to the users and if that’s the case, then we won’t need to trigger the indexer immediately. For the purpose of this example, we’re going to assume that documents that get uploaded are rare and the indexer does not run on a schedule so the new documents need to get immediately indexed and searchable. Let’s also assume that only one person uploads the documents because if there are multiple users uploading at the same time then several index run requests can be requested and it would be difficult to identify which request completed for which Logic App execution (I’ve tried to see if there was a unique ID for the indexer execution that I could use but there did not appear to be one).
This stuck with me for a while so I took a bit of time over the weekend if there were other ways for me to improve this. What I was able to find was that there was a Search Documents API functionality for Azure AI Search that would allow me to check whether a specific file was indexed. The documentation can be found here:
Search Documents (Azure AI Search REST API)
https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents
One of the POST requests specified in the document allows you to specify contents in the body to determine whether a file with the specified file name was indexed by the indexer. The post method format is outlined as follow:
POST https://[service name].search.windows.net/indexes/[index name]/docs/search?api-version=[api-version]
Content-Type: application/json
api-key: [admin or query key]
To provide an example, take the following index and its field names configured for the AI Search Service:
The title field in this index represents the full file name and extension of the file that has been index. To test this with Postman, we would configure a POST call specifying the filter where title equals the file name of the document:
https://dev-aisearch.search.windows.net/indexes/vector-policy/docs/search?api-version=2024-07-01
- The Indexer status is not running
- Searching for the file name returns the confirmation that the file has been indexed
Putting this all together will result in the following Logic App workflow:
- DocSearchResult is not equal to an empty string as denoted with ”
- IndexerStatus is equal to success
The function for this is:
and(not(equals(variables(‘DocSearchResult’), ”)), equals(variables(‘IndexerStatus’), ‘success’))
Note that the action Set variable – Exit loop or not is just one I put in for troubleshooting and is not necessary.
Hope this provides more information on how to check for whether a file has been indexed by the AI Search Service. I would like to acknowledge that this would not handle cases where a file already exists and would need more logic to handle this scenario (perhaps through the use of retrieving the timestamps of the file in the storage account).