A quick hack to bulk-label data in Sharepoint Online and Microsoft Teams
- Posted on May 4, 2021
- Estimated reading time 5 minutes
To easily mass-label content in SharePoint Online (SPO) – without matching to specific sensitive information types – is a topic that has been discussed many times.
Currently, we have two options from Microsoft – auto-labeling with a sensitive information type or using Microsoft Cloud App Security (MCAS), but both technologies have their limitations, making it very problematic in an enterprise environment. Note: This workaround, is intended only as a workaround until Microsoft releases a solution to achieve the goal.
To keep the article short, I’ll skip the more in-depth explanation of the two solutions, their pros and cons and their limitations but I will briefly mention the most prominent ones.
For auto-labeling, there is a built-in limit from Microsoft so you can only label up to 25.000 files a day in Sharepoint Online (SPO) (Teams & OneDrive included).
Another limitation with the auto-labeling for SPO is that you can only apply a policy to 10 sites or OneDrive accounts at a time, and you can have up to 10 auto-labeling policies at a time. These are official limitations from Microsoft, mentioned in several different documents including here.
The second option from Microsoft is using Information Protection with MCAS
Limitations for Information Protection with MCAS is a bit trickier. In some of its documentation, Microsoft mentions a file limit of 100 files a day, which can be removed.
What Microsoft does not mention in any of the public documentation I was able to find (and I consulted with multiple people at Microsoft on this) is what it means to have the 100 files/day removed. The maximum limit is 3000 files a day. Keep in mind, Microsoft documentation is subject to updates and additional information may be publicly available at the time of your reading or the limit could’ve been removed altogether as I imagine Microsoft would be interested in that.
MCAS has also suffered from a lot of throttling from SPO over the past few months, making the average labeling time on files much longer (on average, more than five hours) than the time indicated by the public documentation using words such as “immediate” or “instant.” This makes both testing, and automatic labeling of sensitive data a bit challenging as a file can exist for hours or a day without getting labelled.
I’ve previously implemented a solution using Information Protection with MCAS without any of the issues previously described and files were processed and labelled within 10-15 minutes of the upload, but that has not been the case since at least December 2020 (and is still not the case in March 2021).
Looking at the workaround, we have two options - one without PowerShell and one with PowerShell. The PowerShell option has some additional options, such as copying the label from one file, targeting specific filetypes only, or auto inserting a justification, if lowering the label. It will come as a part of a subsequent article.
A pre-requisite for this method – be it with or without PowerShell – is having the AIP client with unified labeling (UL) support installed on the device and having an AIP P1 license assigned to the user account that will be used and having access to the labels (or at least having the label you want to apply, published to the user account). The solution does require SPO sites to be synced with a user accounts OneDrive, and saved on the device, so the machine must have sufficient storage space. Lastly, you must be logged in to the OneDrive client of the account being used on the machine.
Since this method requires you to sync the data from SPO to a OneDrive, I would recommend setting up an Azure virtual machine (VM) with the amount of storage of your total SPO data (that is to be labelled) and sign into OneDrive on that VM for the process. It can, however, also be done on your normal work device.
The option without PowerShell - Mass labeling data from SPO sites
To do this, navigate to the SPO site (or specific folder), you want to bulk label and use the “Sync” option.
Once the SPO site – or specific document folder or subfolder – is synced to your OneDrive, we can start the labeling. Keep in mind, if you are using an account for the single purpose of mass labeling, and have synced 10 or 20 SPO sites where all supported files need to be labelled, you now have an option to simply right-click on the OneDrive in the File Explorer and chose “Classify and Protect”
(NOTE: the machine used for the screenshots is in Danish). You should also note that you will not have this option when right-clicking on a folder or file if you do not have a somewhat up-to-date version of the AIP client installed on the machine.
It is recommended to have the data synced/saved on the device by right-clicking on the OneDrive or the specific folder and choosing “Always keep on this device” so you see the green circle with a check mark icon instead of the cloud icon. This will significantly improve the speed of the process, as the files will not need to be downloaded during the actual labelling process, but already will be on the device. Once completed, you can again right-click on the folder and choose the “Free up space” option to no longer keep on your device.
In case you do not want to simply label everything synced to your OneDrive with a specific label, but instead want to target one specific site you have synced, you can right-click on the folder and label that folder instead.
The blue button that says “Anvend” will be “Apply” in English.
Note: If there are files that have a pre-existing label from another tenant, or a label you do not have rights or permissions to change or remove, it will show you a sign-in window telling you the account doesn’t exist in the tenant or needs to be added as an external user, etc., that file will show as fail, and the comment will state, why it failed, when looking at the results later.
Now that the AIP client is finished, we can choose to see the results, by clicking the blue, underlined text in the bottom of the window. This will open an excel sheet with an overview of all the files matched by the client in the source location (this will show supported filetypes only).
You will be able to see the files, and the status – in my case, they were all completed. Fuldført = completed.
Failed files (due to for example pre-existing label from another tenant) will show as failed and state why.
You now have a way to bulk label massive amounts of data in SPO with a “default” or “standard” label.
The solution was created with resources from Microsoft. Many thanks to Ashish Ranjan and the team for their help and assistance. For more information on our Security Services, visit Avanade.com/security.