Greetings Thwack,
Apologies in advance if something similar has already been posted here.
The attached Python script can be used to automatically load-balance polling engines for NPM based on MachineType and number of elements. The intention is that this script would be run periodically in order to level out the workload of each polling engine. In order to use this script, you will need to add your server information and credentials to lines 9-11.
Requirements:
- Python 3
- The orionsdk package (pip install orionsdk)
- All polling engines must be able to reach all nodes
My environment consists of:
- Six Orion servers (one core server and 5 additional polling engines)
- NPM, NCM, NTA and LA products
- Approximately 50,000 NPM elements
- Approximately 3,500 nodes with 102 distinct MachineTypes
How it works:
The script starts off by getting a list of polling engines as well as a list of your MachineTypes and element counts for each machine type. The first query is pretty straight forward, the second is an adapted version of the query posted here.
Once we have our polling engines and machine types, we can run an algorithm to determine how to distribute those machine types between polling engines. Here we take the largest MachineType group, and assign it to the polling engine with the least number of elements assigned to it. We continue until all MachineTypes are assigned to polling engines.
After the decisions are made, we then iterate through the list of polling engines. We get the URIs for the nodes that need to be assigned to any given polling engine, and run a bulkupdate to set the EngineID on each node.
Finally, we write a CSV that can be reviewed to see a list of all nodes, their old EngineID and their new EngineID in case you want to see what was done.
In my environment, this has resulted in a difference of 3 elements between the busiest polling engine and the least busy polling engine. If you want to try it out yourself without actually applying any changes, you can simply comment out the swis.bulkupdate command at around line 143 to prevent actual changes from being made, and the script will still output what it would have changed.
Hopefully this script helps somebody in some way. Please comment if you have any feedback!