Using Azure DevOps Wiki Content for AI Chatbots
By Anatoly Mironov
Imagine you want to use the content from all pages in a wiki from an Azure DevOps project, perhaps for an AI chatbot or another application. In this post, I’ll share a simple Python script to load an Azure DevOps (AzDo) wiki into memory, including the markdown content and the real URLs of the pages.
Script Explanation
Here’s the script:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This is a simple python script | |
# that loads all the pages through Azure DevOps APIs | |
# including their content (in markdown) and remote urls | |
# it can then be used in an AI solution | |
from azure.identity import DefaultAzureCredential | |
import requests | |
import json | |
# run az login first | |
credential = DefaultAzureCredential() | |
# Get the bearer token | |
azdo_scope = "499b84ac-1321-427f-aa17-267ca6975798/.default" | |
token = credential.get_token(azdo_scope).token | |
headers = {"Authorization": f"Bearer {token}"} | |
# set the variables | |
organization = "tolle" | |
project = "my-project" | |
wiki_name = "my-knowledge" | |
base_url = f"https://dev.azure.com/{organization}/{project}" | |
pages_base_url = f"{base_url}/_apis/wiki/wikis/{wiki_name}/pages" | |
# Recursive function to flatten the JSON structure | |
def flatten_pages(pages, result=None): | |
if result is None: | |
result = [] | |
for page in pages: | |
# Add the current page's path and remoteUrl to the result | |
if page.get("path") != "/": | |
print(f"Page path: {page.get('path')}") | |
result.append( | |
{ | |
"path": page.get("path"), | |
"remoteUrl": page.get("remoteUrl"), | |
} | |
) | |
# If there are subPages, recursively process them | |
if "subPages" in page and page["subPages"]: | |
flatten_pages(page["subPages"], result) | |
return result | |
url = f"{pages_base_url}?path=/&recursionLevel=full&includeContent=True&api-version=7.1" | |
response = requests.get(url, headers=headers) | |
response.raise_for_status() | |
wiki_pages = response.json() | |
flat_pages = flatten_pages([wiki_pages]) | |
# Initialize an array to store the new objects | |
pages_with_content = [] | |
# Iterate through each page in the flattened list | |
for page in flat_pages: | |
# Get the content of the page using the Azure DevOps API | |
page_path = page["path"] | |
page_url = f"{pages_base_url}?path={page_path}&includeContent=True&api-version=7.1" | |
response = requests.get(page_url, headers=headers) | |
p = response.json() | |
# Check if the request was successful | |
if response.status_code == 200: | |
content = p.get("content") | |
if content: | |
pages_with_content.append( | |
{ | |
"content": content, | |
"path": page["path"], | |
"remoteUrl": page["remoteUrl"], | |
} | |
) | |
else: | |
print( | |
f"Failed to fetch content for {page['remoteUrl']}: {response.status_code} - {response.text}" | |
) | |
# Print the resulting array | |
print(json.dumps(pages_with_content, indent=2)) |
Limitations
This is just a simple example and not suitable for production use. Here are some limitations:
- It lacks error handling,
- it does not read the images
- it does not consider the order and page/subpage relationships - which might be important for understanding the content better.
Advantages
Despite its limitations, the script has some advantages:
- It uses DefaultAzureCredential, making it easier to work with and preparing it for running in the cloud (e.g., using managed identies).
Resources
For more information, check out these resources::
- azure-devops, a thin wrapper around the Azure DevOps REST APi. I discovered it after I started looking at the APIs. In my opinion, what I want to achieve, is better served by calling the APIs directly, which reduces the risk of potential errors.
- wikis - Azure DevOps REST API Reference, the actual api reference for wikis.
Conclusion
I hope you find this script useful for loading Azure DevOps wiki content into memory. Feel free to modify and expand it to suit your needs. Happy coding!