Add-on packages and tools for your NSO deployment.
Loading...
Loading...
Loading...
Loading...
Export observability data to InfluxDB.
The NSO Observability Exporter (OE) package allows Cisco NSO to export observability-related data using software-industry-standard formats and protocols, such as the OpenTelemetry protocol (OTLP). It supports the export of progress traces using OTLP, as well as the export of transaction metrics based on the progress trace data into an InfluxDB database.
To provide insight into the state and working of a system, operators make use of different types of data:
Logs: Information about events taking place in the system, usually for humans to interpret.
Traces: Detailed information about the requests as they traverse the system.
Metrics: Measures of quantifiable aspects of the system for statistical analysis, such as the amount of successful and failed requests.
Each of the data types serves a different purpose. Metrics allow you to get a high-level view of whether the system behaves in an expected manner, for example, no or few failed requests. Metrics also help identify the load on the system (i.e. CPU usage, number of concurrent requests, etc.) but they do not inform you what is happening with a particular request or transaction, for example, the one that is failing.
Tracing, on the other hand, shows the path and the time that the request took in different parts of the overall system. Perhaps the request failed because one of the subsystems took too long to provide the necessary data. That's the kind of information a trace gives you.
However, to understand what took a specific subsystem a long time to respond, you need to consult the relevant logs.
As these are different types of data, different software solutions exist to process, store, and examine them.
For tracing, the package exports progress trace data using the standard OTLP format. Each trace carries a trace-id
that uniquely identifies it and can be supplied as part of the request (see the Progress Trace section in the NSO Development Guide for details), allowing you to find the relevant data in a busy system. Tools such as Jaeger or Grafana (with Grafana Tempo) can then ingest the OTLP data and present it in a graphical way for further analysis.
The Observability Exporter package also performs additional processing of the tracing data and exports the calculated metrics to an InfluxDB time-series database. Using Grafana or a similar tool, you can extract and accumulate the relevant values to produce customized dashboards, for example, showing the average transaction length for each type of service in NSO.
The package exports four different types of metrics, called measurements, to InfluxDB:
span
: Data for individual parts of the transaction, also called spans.
span-count
: Number of concurrent spans, for example, how many transactions are in the prepare phase (prepare span) at the same time.
transaction
: Sum of span durations per transaction, for example, the cumulative time spent in creating code when a transaction configures multiple services.
transaction-lock
: Details about transaction lock, such as queue length when acquiring or releasing the lock.
To install the Observability Exporter add-on, follow the steps below:
Install the prerequisite Python packages: parsedatetime
, opentelemetry-exporter-otlp
, and influxdb
. To install the packages, run the command pip install -r src/requirements.txt
from the package folder.
Add the Observability Exporter package in a manner suitable for your NSO installation. This usually entails copying the package file to the appropriate packages/
folder and performing a package reload. For more information, refer to the NSO product documentation on package management.
Observability Exporter configuration resides under the progress export
container in NSO. All export functions can be enabled or disabled through the top-level enabled
leaf.
To configure exporting tracing data, use the otlp
container. This is a presence container that controls whether export is enabled or not. In the container, you can define the target host and port for sending data, as well as the transport used. Unless configured otherwise, the data is exported to the localhost using the default OTLP port, so there is minimal configuration required if you run the collector locally, for example, on the same system or as a sidecar in a container deployment.
The InfluxDB export is configured and enabled using the influxdb
presence container, where you set the host to export metrics to. You can also customize the port number, username, password, and database name used for the connection.
Under the progress export
you can also configure extra-tags
, the additional tag name-value pairs that the system adds to the measurements. These are currently only used for InfluxDB.
The following is a sample configuration snippet using different syntax styles:
Note that the current version of the Observability Exporter uses InfluxDB v1 API. If you run an InfluxDB 2.x database instance, you need to enable v1 API client access with the influx v1 auth create
command or a similar mechanism. Refer to Influxdata docs for more information.
This example shows how to use the Jaeger software (https://www.jaegertracing.io) to visualize the progress traces. It requires you to install Jaeger on the same system as NSO and is therefore only suitable for demo or development purposes.
First, make sure that you have a running NSO instance and that you have successfully added the Observability Exporter package. To verify, run the show packages package observability-exporter
command from the NSO CLI.
Download and run a recent Jaeger all-in-one binary from the Jaeger website, using the --collector.otlp.enabled
switch:
Keep Jaeger running, and from another terminal, enter the NSO CLI to enable OTLP data export:
Jaeger should now be receiving the transaction traces. However, if you have no running transactions in the system, there will be no data. So, make sure that you have some traces by performing a trivial configuration change:
Now you can connect to the Jaeger UI at http://localhost:16686 to explore the data. In the Search pane, select "NSO" service and click Find Traces.
Clicking on one of the traces will bring you to the trace view, such as the following one.
This example shows you how to store and do basic processing and visualization of data in InfluxDB. It requires you to install InfluxDB on the same system as NSO and is therefore only suitable for demo or development purposes.
First, ensure you have a running NSO instance and have successfully added the Observability Exporter package. To verify, run the show packages package observability-exporter
command from the NSO CLI.
Next, set up an InfluxDB instance. Download and install the InfluxDB 2 binaries and the corresponding influx CLI appropriate for your NSO system. See Influxdata docs for details, e.g. brew install influxdb influxdb-cli
on a macOS system.
Make sure that you have started the instance, then complete the initial configuration of InfluxDB. During the configuration, create an organization named my-org
and a bucket named nso
. Do not forget to perform the Influx CLI setup. To verify that everything works in the end, run:
In the output, find the ID of the NSO bucket that you have created. For example, here it is 5d744e55fb178310
but yours will be different.
Create a username/password pair for v1
API access:
Use the BUCKET_ID
that you have found in the output of the previous command.
Now connect to the NSO CLI and configure the InfluxDB exporter to use this instance:
The username and password should match those created with the previous command, while the database name (using the default of nso
here) should match the bucket name. Make sure that you have some data for export by performing a trivial configuration change:
Open the InfluxDB UI at http://localhost:8086 and log in, then select the Data Explorer from the left-hand menu. Using the query builder, you can explore and visualize the data.
For example, select the nso
bucket, span
measurement, and duration
as a field filter. Keeping other settings at their default values, it will graph the average (mean) times that various parts of the transaction take. If you wish, you can further configure another filter for name
, to only show the values for the selected part.
Note that the above image shows data for multiple transactions over a span of time. If there is only a single transaction, the graph will look empty and will instead show a single data point when you hover over it.
This example shows integrating the Observability Exporter with Grafana to monitor NSO application performance.
First, ensure you have a running NSO instance and have successfully added the Observability Exporter package. To verify, run the show packages package observability-exporter
command from the NSO CLI.
Next, set up an InfluxDB instance. Follow steps 2 to 4 from the Minimal Metrics Example with InfluxDB.
Next, set up a Grafana instance. Refer to Grafana Docs for installing Grafana on your system. A MacOS example:
Install Grafana.
Start the Grafana instance.
Configure the Grafana Organization name.
Add InfluxDB as a Data Source in Grafana. Download the file influxdb-data-source.json and replace "my-token" with the actual token from the InfluxDB instance in the file and run the below command.
Set up the NSO example Dashboard. This step requires the JQ command-line tool to be installed first on the system.
Download the sample NSO dashboard JSON file dashboard-nso-local.json and run the below command. Replace the "value"
field's value with the actual Jaeger UI URL where "name"
is INPUT_JAEGER_BASE_URL
under "inputs"
.
(Optional) Set the NSO dashboard as a default dashboard in Grafana.
Connect to the NSO CLI and configure the InfluxDB exporter:
To perform a few trivial configuration changes, open the Grafana UI at http://localhost:3000/ and log in with username admin
and password admin
. Setting the NSO dashboard as a default dashboard will show different charts and graphs showing NSO metrics.
Below are the panels showing metrics related to the transactions, such as transaction throughput, longest transactions, transaction locks held, and queue length.
Below are the panels showing metrics related to the services, such as mean/max duration for create service
, mean duration for run service
, and the service's longest spans.
Below are the panels showing metrics related to the devices, such as device locks held, longest device connection, longest device sync-from, and concurrent device operations.
All previously mentioned databases and virtualization software can also be brought up in a Docker environment with Docker volumes, making it possible to persist the metric data in the data stores after shutting down the Docker containers.
To facilitate bringing up the containers and the interconnectivity of databases and virtualization containers, a setup bash script called setup.sh
is provided together with a compose.yaml
file that describes all Docker containers to create and start, as well as configuration files to configure each container.
This diagram shows an overview of the containers that Compose creates and starts and how they are connected.
To create the Docker environment described above, follow these steps:
Make sure Docker and Docker Compose are installed on your machine. Refer to the Docker documentation on installing Docker for your respective OS. You can verify that Docker and Compose are installed by executing the following commands in a terminal and getting a version number as output:
docker
docker compose
Download the NSO Observability Exporter package from CCO, untar it, and cd
into the setup
folder:
Make the setup.sh
script executable:
Run the setup.sh
script without arguments to use the default ports for containers and the default username and password for InfluxDB or supply arguments to set a specific port for each container:
Use the default values defined in the script.
Provide port values and InfluxDB configuration.
To run secure protocol configuration, whether it's HTTP or gRPC, utilize the provided setup script with the appropriate security settings. Ensure the necessary security certificates and keys are available. For HTTPS and gRPC Secure, a TLS certificate and private key files are necessary. For instructions on creating self-signed certificates, refer to Creating Self-Signed Certificate.
The script will output NSO configuration to configure the Observability Exporter and URLs to visit the dashboards of some of the containers.
You can run the setup.sh
script with the --help
flag to print help information about the script and see the default values used for each flag.
Enable HTTPS to enable OTLP through HTTPS, root certificate authority (CA) certificate file in PEM format needs to be specified in NSO configuration for both traces and metrics.
After configuring the Observability Exporter with the NSO configuration printed by the setup.sh
script, e.g., using the CLI load
command or the ncs_load
tool, traces, and metric data should be seen in Jaeger, InfluxDB, and Grafana as shown in the previous setup.
The setup can be brought down with the following commands:
Bring down containers only.
Bring down containers and remove volumes.
Prerequisites: OpenSSL: Ensure that OpenSSL is installed on your system. Most Unix-like systems come with OpenSSL pre-installed.
Generate a Private Key: First, generate a private key using OpenSSL. Run the following command in your terminal or command prompt:
Install OpenSSL:
Create a Root CA (Certificate Authority):
Generate SSL Certificates Signed by the Root CA:
Use the Certificates:
In the previous test environment setup, we exported traces to Jaeger and metrics to Prometheus but progress-trace and metrics can also be sent to Splunk Observability Cloud.
In order to send traces and metrics to Splunk Observability Cloud, either the OpenTelemetry Collector Contrib or Splunk OpenTelemetry Collector can be used.
Here is an example config that can be used with the OpenTelemetry Collector Contrib to send traces and metrics:
An access token and the endpoint of your Splunk Observability Cloud instance are needed to start exporting traces and metrics. The access token can be found under the settings -> Access Tokens
menu in your Splunk Observability Cloud dashboard. The endpoint can be constructed by looking at your Splunk Observability Cloud URL and replacing <SIGNALFX_REALM>
with the one you see in the URL. e.g. https://ingest.us1.signalfx.com/v2/trace.
Traces can be accessed at https://app.us1.signalfx.com/#/apm/traces and Metrics are available when accessing or creating a dashboard at https://app.us1.signalfx.com/#/dashboards.
More options for the sapm
and signalfx
exporters can be found at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/sapmexporter/README.md and https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/signalfxexporter/README.md respectively.
In the current Observability Exporter version, metrics from spans, that is metrics that are currently sent directly to InfluxDB, cannot be sent to Splunk.
Download Splunk Enterprise. Visit the Splunk Enterprise. Select the appropriate version for your operating system (Linux, Windows, macOS). Download the installer package.
Install Splunk Enterprise.
On Linux:
Transfer the downloaded .rpm
or .deb
file to your Linux server.
Install the package:
For RPM-based distributions (RedHat/CentOS):
For DEB-based distributions (Debian/Ubuntu):
On Windows:
Run the downloaded .msi
installer.
Follow the prompts to complete the installation.
Start Splunk.
On Linux:
On Windows:
Open the Splunk Enterprise application from the Start Menu.
Access Splunk Web Interface.
Navigate to http://:8000. Log in with the default credentials (admin/changeme).
Create an Index via the Splunk Web Interface:
Click on Settings in the top-right corner.
Under the Data section, click on Indexes.
Create a New Index:
Click on the New Index button.
Fill in the required details:
Index Name: Enter a name for your index (e.g., nso_traces, nso_metrics).
Index Data Type: Select the type of data (e.g., Events or Metrics).
Home Path, Cold Path, and Thawed Path: Leave these as default unless you have specific requirements.
Click on the Save button.
Enable HTTP Event Collector (HEC) on Splunk Enterprise. Before you can use Event Collector to receive events through HTTP, you must enable it. For Splunk Enterprise, enable HEC through the Global Settings dialog box.
Click Settings > Data Inputs.
Click HTTP Event Collector.
Click Global Settings.
In the All Tokens toggle button, select Enabled.
Choose a nso_traces/nso_metrics for their respective HEC tokens.
Click Save.
Create an Event Collector token on Splunk Enterprise. To use HEC, you must configure at least one token.
Click Settings > Add Data.
Click monitor.
Click HTTP Event Collector.
In the Name field, enter a name for the token.
Click Next.
Click Review.
Confirm that all settings for the endpoint are correct, click Submit. Otherwise, click < to make changes.
Configure the OpenTelemetry Protocol (OTLP) Collector:
Create or edit the otelcol.yaml
file to include the HEC configuration. Example configuration:
Save the configuration file.
For additional support questions, refer to Cisco Support.
Schedule provisioning tasks in NSO.
Phased Provisioning is a Cisco NSO add-on package for scheduling provisioning tasks. Initially designed for gradual service rollout, it leverages NSO actions to give you more fine-grained control over how and when changes are introduced into the network.
A common way of using NSO is by an operator performing an action through the NSO CLI, which takes place immediately. However, when you perform a large number of changes or other actions, you likely have additional requirements, such as:
You want to limit how many changes or actions can run at the same time.
You want to schedule changes or actions to run outside of business hours.
One or two actions failing is fine, but if several of them fail, you want to stop provisioning and investigate.
Phased Provisioning allows you to do all of that and more. As the framework invokes standard NSO actions to do the actual work, you can use it not just for services provisioning but for NED migrations and other operations too.
The NSO Phased Provisioning binaries are available from Cisco Software Central and contain the phased-provisioning
package. Add it to NSO in a manner suitable for your installation. This usually entails copying the package file to the appropriate packages/
folder and performing a package reload. If in doubt, please refer to the NSO product documentation on package management.
To verify the status of the package on your NSO instance, run the show packages package phased-provisioning
command.
If you later wish to uninstall, simply remove the package from NSO, which will also remove all Phased-Provisioning-specific configuration and data. It is highly recommended that you make a backup before removing the package, in case you need to restore or reference the data later.
After adding the package, Phased Provisioning does not require any special configuration and you can start using it right away. All you need is an NSO action that you want to use it with. In this Quickstart, that will be the device NED migrate action, which is built into NSO.
The goal is to migrate a number of devices from router-nc-1.0 NED to router-nc-1.1. One way of doing this is with the /devices/migrate
action all at once, or by manually invoking the /devices/device/migrate
action on each device with the new-ned-id
parameter as:
However, considering you want to achieve a phased (staggered) rollout, create a Phased Provisioning task
to instruct the framework of the actions that you want to perform:
This configuration defines a task named run_ned_migrate
. It also defines a target
value (that is an instance identifier) to select the nodes on which you want to run the action.
You provide the action name with the action/action-name
value and set any parameters that the action requires. The name of the parameter can be set through variable/name
and the value of the parameter can be set through any one of the below:
variable/value
for the string value of the parameter.
variable/expr
for XPath expression (value is determined through XPath calculation with respect to nodes filtered by target
and filter
or the target-nodes
defined while running the task).
Here, the single argument is new-ned-id
with the value of router-nc-1.1
.
If the action has an input empty leaf, then you can only set variable/name
without defining any value, for example, device sync-from
action with no-wait-for-lock
flag.
In the current configuration, the action will run on all the devices. This is likely not what you want and you can further limit the nodes using an XPath expression through a filter
value, for example, to only devices that currently use the router-nc-1.0 NED:
If you want to run an action on heterogeneous nodes which may not be determined from a single target
and filter
, then you can define a task without target
and filter
values. But, while running the task, you must dynamically set the nodes in target-nodes
of run
action, described later in this document.
Note: Please check the description for
/phased-provisioning/task/action/action-name
regarding the conditions to determine action execution status.
In addition to what the task will do, you also need to specify how and when it will run. You do this with a Phased Provisioning policy
:
The "one_by_one" policy, as it is named in this example, will run one migration at a time (batch/size
), with an error-budget
of 1, meaning the task will stop as soon as more than one migration fails. The value for schedule
is immediately
, which means as soon as possible after you submit this task for processing. Instead, you could also schedule it for a particular time in the future, such as Saturday at 1 a.m.
Finally, configure the task to use this policy:
Having committed the task, you must also submit it to the scheduler if you want it to run. Use the /phased-provisioning/task/run
action to do so:
If the task does not already have a target
set, you must pass dynamic nodes in target-nodes
, for example:
Note: The selected
target-nodes
must support invoking the selectedaction
orself-test
action with the provided parameters, as defined in the task.
You can observe the status of the task with the show phased-provisioning task-status
command, such as:
With many items (nodes) in the task, the output could be huge and you might want to use the brief
action instead (note that there is no show in the command now):
In case enough actions fail, the error budget runs out and the execution stops:
To restart processing, use the /phased-provisioning/task/resume
action, allowing more errors to accumulate (if you reset the error budget) or not:
You can temporarily pause an in-progress task, such as when you observe a problem and want to intervene to avoid additional failures.
Use the /phased-provisioning/task/pause
action for pausing a task. This will suspend the task with an appropriate reason. You can later restart the task by executing the /phased-provisioning/task/resume
action.
The task will be suspended with a reason as observed in task-status
.
If you want to re-try running the task for the failed nodes, use the/phased-provisioning/task/retry-failures
action. This will move the failed nodes back to pending, so that, the nodes can be re-executed again. You can also re-execute specific failed nodes by specifying these in failed-nodes
input of retry-failures
action. This action does not change the error-budget
.
To retry all failed nodes:
To retry specific failed nodes:
If the task has already completed, then after executing this action, the task will be marked suspended
with appropriate reason
. Then you can resume the task again to retry the failed nodes.
While great for running actions, you can also use this functionality to provision (or de-provision) services in a staged/phased manner. There are two steps to achieving this:
First, configure service instances as you would normally, but commit the changes with the commit no-deploy
command.
Second, configure a Phased Provisioning task to invoke the reactive-re-deploy
action for these services, taking advantage of all the Phased Provisioning features.
Here is an example of a trivial static-dns
service.
You can verify that using the commit no-deploy
did not result in any device configuration yet:
Then, create a task for phased provisioning, using the one_by_one
policy from Quickstart:
Finally, start the task:
You can follow the task's progress with the following show
command:
Note: This command will refresh the output every second, stop it by pressing Ctrl+c.
For simple services, such as the preceding static-dns
, successfully updating device configuration may be a sufficient indicator that the service was deployed without problems. For more complex services, you typically want to run additional tests to ensure everything went according to plan. Such services will often have a self-test
action that performs this additional validation.
Phased Provisioning allows you to run custom verification, whether you are deploying services or doing some other type of provisioning. You can configure this under self-test
container in the task configuration.
Please check the description for
/phased-provisioning/task/self-test/action-name
regarding the restrictions applied for action validation.
For example, the following commands will configure the service self-test
action for validation.
Alternatively, you can use self-test/test-expr
with an XPath expression, which must evaluate to a true value.
In addition to an immediately scheduled policy, you can opt for a policy with future scheduling. This allows you to set a (possibly recurring) time when provisioning takes place.
You can set two separate parameters:
time
: Configures at what time to start, in the Vixie-style cron format (further described below).
window
: Configures for how long after the start time new items can start processing.
Using both of these parameters enables you to limit the execution of a task to a particular time of day, such as when you have a service window. If there are still items in the task after the current window has passed, the system will wait for the next occurrence to process the remaining items.
The format for the time parameter is as follows:
Each of the asterisks (*
) represents a field, which can take one of the following values:
A number, such as 5
.
A number range, such as 5-10
.
An asterisk *
, meaning any. For example, 0-59 and *
are equivalent for the first (minute) field.
Each of these values can further be followed by a slash (/
) and a number, denoting a step. For example, if used in the first field, */3
means every third minute instead of every minute (*
only).
A number, range, and step can also be combined together with a comma (,
) for each of these values. For example, if used in the first field, 5,10-13,20,25-28,*/15
means at minute 5, every minute from 10 through 13, at minute 20, every minute from 25 through 28, and every 15th minute.
You can update a policy used in a task irrespective of the task's running status (init
, in-progress
, completed
, or suspended
).
Updating a completed
task's policy will not impact anything.
If an init
task's policy schedule is updated to immediately
, then the task will start executing batches immediately. Change to error-budget
will also be reflected immediately. Change to batch-size
or schedule/future/time
or schedule/future/window
will only reflect when the task starts as per the new schedule time.
If a suspended
task's policy is updated, then the changes will be reflected upon resuming the task.
For an in-progress
task,
If the policy schedule updated from immediately
to schedule/future/time
or schedule/future/time
changed to a new time, then after the completion of the current batch, the next batch execution will be stopped and scheduled as per the new schedule time.
If the policy schedule updated from schedule/future/time
to immediately
, the task will continue to run till it completes.
Update to batch-size
or schedule/future/window
will be reflected upon the next batch execution after the current batch completion.
Update to error-budget
will be reflected immediately to allocated-error-budget
whereas the current-error-budget
is adjusted depending on previously failed nodes.
Phased Provisioning tasks perform no access checks for the configured actions. When a user is given access to the Phased Provisioning feature through NACM, they can implicitly invoke any action in NSO. That is, even if a user can't access an action directly, they can configure a task that invokes this action.
To amend this behavior, you can wrap Phased Provisioning functionality with custom actions or services and in this way limit available actions.
Tasks with future-scheduled policies make use of the NSO built-in scheduler functionality, which runs the task as the user that submitted it for scheduling (the user that invoked the run
action on the task). If external authentication or PAM supplies the user groups for this user or you explicitly set groups using the ncs_cli -g
command when connecting, the scheduling may fail.
This happens if the admin
user is not mapped to a group with sufficient NACM permissions in NSO, such as in the default system-install configuration.
To address this issue, add the "admin" user to the correct group, using the /nacm/groups/group/user-name
configuration. Instead of "admin", you can choose a different user with the /phased-provisioning/local-user
setting. In any case, this user must have permission to invoke actions on the /cisco-pdp:phased-provisioning/task/
node. For example:
As a significantly less secure alternative, you can change the default for a user without a matching group by using the /nacm/exec-default
setting.
The phased-provisioning
data model in phased-provisioning/src/yang/cisco-phased-provisioning.yang
.
Manage resource allocation in NSO.
The NSO Resource Manager package contains both an API for generic resource pool handling called the resource allocator
, and the two applications (id-allocator
andipaddress-allocator
) utilizing the API. The applications are explained separately in the following sections below:
This version of NSO Resource Manager is 4.2.8 and was released together with NSO version 6.4.
NSO is often used to provision services in the networking layer. It is not unusual that these services require network-level information that is not (or cannot be) part of the instance data provided by the northbound system, so it needs to be fetched from, and eventually released back to a separate system. A common example of this is IP addresses used for layer-3 VPN services. The orchestrator tool is not aware of the blocks of IP addresses assigned to the network but relies on lower layers to fulfill this need.
Some customers have software systems to manage these types of temporary assets. E.g., for IP-addresses, they are usually known as IP Address Management (IPAM) systems. There is a whole industry of solutions for such systems, ranging from simple open-source solutions to entire suites integrated with DNS management. See IP address management for more on this.
There are customers that either don't have an IPAM system for services that are planned for NSO or are not planning to get one for this single purpose. They usually don't want the operational overhead of another system and/or don't see the need for a separate investment. These customers are looking for NSO to provide basic resource allocation and lifecycle management for the assets required for services managed by NSO. They appreciate the fact that NSO is not an appropriate platform to provide more advanced features from the IPAM world like capacity planning nor to integrate with DNS and DHCP platforms. This means that the NSO Resource Manager does not compete with full-blown systems but is rather a complementary feature.
The NSO Resource Manager interface, the resource allocator
, provides a generic resource allocation mechanism that works well with services and in a high availability (HA) configuration. Expected is implementations of specific resource allocators implemented as separate NSO packages. A service will then have the possibility to use allocator implementations dedicated to different resources.
The YANG model of the resource allocator (resource-allocator.yang
) can be augmented with different resource pools, as is the case for the two applications id-allocator
and ipaddress-allocator
. Each pool has an allocation list where services are expected to create instances to signal that they request an allocation. Request parameters are stored in the request
container and the allocation response is written in the response
container.
Since the allocation request may fail the response container contains a choice where one case is for error and one for success.
Each allocation list entry also contains an allocating-service
leaf-list. These are instance identifiers that point to the services that requested the resource. These are the services that will be redeployed when the resource has been allocated.
The resource allocation packages should subscribe to several points in this resource-pool
tree. First, they must detect when a new resource pool is created or deleted. Secondly, they must detect when an allocation request is created or deleted. A package may also augment the pool definition with additional parameters, for example, an IP address allocator may wish to add configuration parameters for defining the available subnets to allocate from, in which case it must also subscribe to changes to these settings.
The installation of this package is done as with any other package, as described in the NSO Packages section of the Administration Guide.
The API of the resource allocator is defined in this YANG data model:
Looking at High Availability, there are two things we need to consider - the allocator state needs to be replicated, and the allocation needs only to be performed on one node.
The easiest way to replicate the state is to write it into CDB-oper and let CDB perform the replication. This is what we do in the ipaddress-allocator
.
We only want the allocator to allocate addresses on the primary node. Since the allocations are written into CDB they will be visible on both primary and secondary nodes, and the CDB subscriber will be notified on both nodes. In this case, we only want the allocator on the primary node to perform the allocation.
We therefore read the HA mode leaf from CDB to determine which HA mode the current subscriber is running in; if HA mode is not enabled, or if HA mode is enabled and the current node is primary we proceed with the allocation.
This synchronized allocation API request uses a reactive fastmap, so the user can allocate resources and still keep a synchronous interface. It allocates resources in the create callback, at that moment everything we modify in the database is part of the service intent and fast map. We need to guarantee that we have used a stable resource and communicate to other services, which resources we have used. So, during the create callback, we store what we have allocated. Other services that are evaluated within the same transaction which runs subsequent to ours will see allocations, when our service is redeployed, it will not have to create the allocations again.
When an allocation raises an exception in case the pool is exhausted, or if the referenced pool does not exist in the CDB, commit
will get aborted. Synchronous allocation doesn't require service re-deploy
to read allocation. The same transaction can read allocation, commit dry-run
or get-modification
should show up the allocation details as output.
Synchronous allocation is only supported through the Java and Python APIs provided by the Resource Manager.
This section explores deployment information and procedures for the NSO ID Allocator (id-allocator
). The NSO Resource ID Allocator is an extension of the generic resource allocation mechanism called the NSO Manager. It can allocate integers which can serve for instance as VLAN identifiers.
The ID Allocator can host any number of ID pools. Each pool contains a certain number of IDs that can be allocated. They are specified by a range, and potentially broken into several ranges by a list of excluded ranges.
The ID allocator YANG models are divided into a configuration data-specific model (idallocator.yang
), and an operational data-specific model (id-allocator-oper.yang
). Users of this package will request allocations in the configuration tree. The operational tree serves as an internal data structure of the package.
An ID request can allocate either the lowest possible ID in a pool or a specified (by the user) value, such as 5 or 1000.
Allocation requests can be synchronized between pools. This synchronization is based on the ID of the allocation request itself (such as for instance allocation1
), the result is that the allocations will have the same allocated value across pools.
This section presents some simple use cases of the NSO presented using Cisco-style CLI.
The NSO ID Allocator requires a username to be configured by the service application when creating an allocation request. This username will be used to redeploy the service application once a resource has been allocated. Default NACM rules deny all standard users access to the /ralloc:resource-pools
list. These default settings are provided in the (initial_data/aaa_init.xml
) file of the resource-manager package.
It is up to the administrator to add a rule that allows the user to perform the service re-deploy.
The administrator's instructions on how to write these rules are detailed in the AAA Infrastructure.
There are two alarms associated with the ID Allocator:
Empty Alarm: This alarm is raised when the pool is empty, and there are no available IDs for further allocation.
Low threshold Reached Alarm: This alarm is raised when the pool is nearing empty, e.g., there is only 10% or fewer left in the pool.
Since the Resource Manager's version 4.0.0, the operational data model is not compatible with the previous version. In version 4.0.0 Yang model, there is a new element called allocationId
added for /Id-allocator/pool/allocation
to support sync ID allocation. The system will run the upgrade script automatically (when the Resource Manager of the new version is loaded) if there is a Yang model change in the new version. Users can also run the script manually for Resource Manager from 3.5.6 (or any version below 4.0.0) to version 4.0.0 or above; the script will add the missing allocationId
element in the CDB operational data path /id-allocator/pool/allocation
. The upgrade Python script is located in the Resource Manager package: python/resource_manager/rm_upgrade_nso.py
.
After running the script manually to update CDB, the user must request package reload
or restart ncs
to reload new CBD data into the ID Pool java object in memory. For example, in the NSO CLI console: admin@ncs> request packages reload force
.
id-allocator-tool
ActionA set of debug and data tools (contained in rm-action/id-allocator-tool
action) is available to help admin or support to operate on RM data. Two parameters in the id-allocator-tool
action can be provided: operation
, pool
. All the process info and results will be logged in ncs-java-vm.log
, and the action itself just returns the result. Here is a list of the valid operation values for the id-allocator-tool
action:
check_missing_report
: Scan the current resource pool and ID pool in the system, and identify and report the missing element for each id-allocator entry without fixing.
fix_missing_allocation_id
: Add the missing allocation ID for each ID allocator entry.
fix_missing_owner
: Add the missing owner info for each ID allocator entry.
fix_missing_allocation
: Create the missing allocation entry in the ID allocator for each ID pool allocation response/id.
fix_response_id
: Scan the ID pool and check if the allocation contains an invalid allocation request ID, and release the allocation from the ID pool if found. It happens for sync allocation when the device configuration fails after a successful ID allocation and then causes a service transaction fail. This leaves the ID pool containing successfully allocated ID while the allocation request response doesn't exist
persistAll
: Manually sync from ID pool in memory to ID allocator in CDB.
printIdPool
: Print the current ID pool data in ncs-java-vm.log
for debugging purposes.
Note that when a pool parameter is provided, the operation will be on this specific ID pool, and if no pool is provided, the operation will be running on all ID pools in the system.
This section contains deployment information and procedures for the Tail-f NSO IP Adress Allocator (ipaddress-allocator
) application.
The NSO IP Address Allocator application contains an IP address allocator that use the Resource Manager API to provide IP address allocation. It uses a RAM-based allocation algorithm that stores its state in CDB as oper data.
The file resource-manager/src/java/src/com/tailf/pkg/ipaddressallocator/IPAddressAllocator.java
contains the part that deals with the resource manager APIs whereas the RAM-based IP address allocator resides under resource-manager/src/java/src/com/tailf/ pkg/ipam
.
The IPAddressAllocator
class subscribes to five points in the DB:
/ralloc:resource-pools/ip-address-pool
: To be notified when new pools are created/deleted. It needs to create/delete instances of the IPAddressPool class. Each instance of the IPAddressPool handles one pool.
/ralloc:resource-pools/ip-address-pool/subnet
: To be notified when subnets are added/removed from an existing address pool. When a new subnet is added, it needs to invoke the addToAvailable
method of the right IPAddressPool
instance. When a pool is removed, it needs to reset all existing allocations from the pool, create new allocations, and re-deploy the services that had the allocations.
/ralloc:resource-pols/ip-address-pool/exclude
: To detect when new exclusions are added and when old exclusions are removed.
/ralloc:resource-pols/ip-address-pool/range
: To be notified when ranges are added to or removed from an address pool.
/ralloc:resource-pols/ip-address-pool/allocation
: To detect when new allocation requests are added and when old allocations are released. When a new request is added, the right size of the subnet is allocated from the IPAddressPool
instance and the result is written to the response/subnet leaf, and finally, the service is redeployed.
This section presents some simple use cases of the NSO IP Address Allocator. It uses the C-style CLI.
The NSO IP Address Allocator requires a username to be configured by the service applications when creating an allocation request. This username will be used to redeploy the service applications once a resource has been allocated. The default NACM rules deny all standard users access to the / ralloc:resource-pools
list. These default settings are provided in the (initial_data/ aaa_init.xml
) file of the Resource Manager package.
There are two alarms associated with the IP Address Allocator:
Empty Alarm: This alarm is raised when the pool is empty, and there are no available IPs that can be allocated.
Low Threshold Reached Alarm: This alarm is raised when the pool is nearing empty, e.g., there are only 10% or fewer separate IPs left in the pool.
ip-allocator-tool
ActionA set of debug and data tools contained in the rm-action/ip-allocator-tool
action is available to help the admin or support personnel to operate on the RM data. Two parameters in the ip-allocator-tool
action can be provided: operation
, pool
. All the process info and the results will be logged in ncs-java-vm.log
, and the action itself just returns the result. Here is a list of the valid operation values for the ip-allocator-tool
action.
fix_response_ip
: Scan the IP pool to check if the allocation contains an invalid allocation request ID, and release the allocation from the IP pool, if found. It happens for sync allocation when the device configuration fails after a successful IP allocation and then causes a service transaction to fail. This leaves the IP pool to contain successfully allocated IP while the allocation request response doesn't exist.
printIpPool
: Print the current IP pool data in the ncs-java-vm.log
for debugging purposes.
Note that when a pool parameter is provided, the operation will be on this specific IP pool.
This section covers the NSO Resource Manager data models.
The NSO Packages section in the NSO Administration Guide.
The AAA Infrastructure section in the NSO Administration Guide.
Description of the APIs exposed by the Resource Manager package.
About this Guide
This NSO Resource Manager (RM) API Guide describes the APIs exposed by the Resource Manager package that you can use to allocate IPs from the IP resource pool and to allocate the IDs from ID resource pools.
Intended Audience
This guide is intended for Cisco advanced services developers, network engineers, and system engineers to install the RM package inside NSO and then utilize the APIs exposed by the RM package to allocate and manage IP subnets and IDs as required by other CFPs installed alongside this RM package inside NSO.
Additional Documentation
This documentation requires the reader to have a good understanding of NSO and its usage as described in the following NSO documentation:
The APIs exposed by the Resource Manager package are used to allocate IP subnets and IDs from the IP and ID resource pools respectively by the applications requesting the resources. The APIs help to allocate, update, or deallocate the resources. You can make API calls to the resource pools as long as the pool is not exhausted of the resources. If the pool is exhausted of resources or if the referenced pool does not exist in the database when there is a request, the allocation raises an exception.
When a service makes multiple resource allocations from a single pool, the optional ‘name’ parameter allows the service to distinguish the different allocations. By default, the parameter value is an empty string.
Resource allocation can be synchronous or asynchronous.
The synchronized allocation API request uses a Reactive-Fast-Map to allocate resources and still manages the interface to look synchronous. This means that as you create any allocation request from Northbound, you can see the allocation results, such as the requested IP subnet/ID in the same transaction. If a NB is making an allocation request, and in the same transaction a configuration is being applied to a specific device, the commit dry run receives the request response, and the response is processed by the RM and the configurations are pushed to the device in the same transaction. Thus, the NB user can see the get modification in the commit dry run.
During a resource request, the resource is allocated and stored in the create callback. This allocation is visible to other services that are run in the same or subsequent transactions and therefore avoids the recreation of resource when the service is redeployed. Synchronous allocation does not require service re-deploy to read allocation. The same transaction can read allocation. Commit dry-run or get-modification displays the allocation details as output.
The following is an example for a Northbound service callback passed with required API parameters for both synchronous and asynchronous IPv4 allocations. The example uses pool-example
package as a reference. The request describes the details it uses, such as the pool, device. Each allocation has an allocation ID. In the following example, the allocating service pulls one IPv4 address from the IPv4 resource pool. The requesting service then uses this allocated IP address to set the interface address on the device southbound to NSO.
The payloads below demonstrate the Northbound service allocation request using the Resource Manager synchronous and asynchronous flows. The API pulls one IP address from the IPv4 resource pool and sets the returned IP address on the interface IOS1 device. The payloads demonstrate both synchronous and asynchronous flows.
IPv4 and IPv6 have separate IP pool types; there is no mixed IP pool. You can specify a prefixlen
parameter for IP pools to allocate a net of a given size. The default value is the maximum prefix length of 32 and 128 for IPv4 and IPv6, respectively.
The following APIs are used in IPv4 and IPv6 allocations.
Resource Manager exposes the API calls to request IPv4 and IPv6 subnet allocations from the resource pool. These requests can be synchronous or asynchronous. This topic discusses the APIs for these flows.
The NSO Resource Manager interface and the resource allocator provide a generic resource allocation mechanism that works well with services. Each pool has an allocation list where services are expected to create instances to signal that they request an allocation. The request parameters are stored in the request container, and the allocation response is written in the response container.
The APIs exposed by RM are implemented in Python as well as Java, so the NB user can configure the service to be a Java package or a Python package and call the allocator API as per the implementation. The NB user can also use NSO CLI to make an allocation request to the IP allocator RM package.
This section covers the Java APIs exposed by the RM package to the NB user to make IP subnet allocation requests.
The asynchronous subnet allocation requests can be created for a requesting service with:
The redeploy type set to default
type or set to redeployType
.
The CIDR mask length can be set to invert the subnet mask length for Boolean
operations with IP addresses or set not to be able to invert the subnet mask length.
Pass the starting IP address of the subnet to the requesting service redeploy type
(default
/redeployType
).
The following are the Java APIs for asynchronous IP allocation requests.
Common Exceptions Raised by Java APIs for Allocation Not Successful
The API throws the following exception error if the requested resource pool does not exist: ResourceErrorException
The API throws the following exception error if the requested resource pool is exhausted: AddressPoolException
The API throws the following exception error if the requested netmask is invalid: InvalidNetmaskException
The sync_alloc
parameter in the API determines if the allocation request is for a synchronous or asynchronous mode. Set the sync_alloc
parameter to true for synchronous flow.
The subnet allocation requests can be created for a requesting service with:
The redeploy type set to default type or redeployType
type.
The CIDR mask length can be set to invert the subnet mask length for Boolean operations with IP addresses or set to not be able to invert the subnet mask length.
Pass the starting IP address of the subnet to the requesting service redeploy type (default
/redeploytype
).
The following are the Java APIs for synchronous or asynchronous IP allocation requests.
Common Exceptions Raised by Java APIs for Allocation Not Successful
The API throws the following exception error if the requested resource pool does not exist: ResourceErrorException
The API throws the following exception error if the requested resource pool is exhausted: AddressPoolException
The API throws the following exception error if the requested netmask is invalid: InvalidNetmaskException
Once the requesting service requests allocation through an API call, you can verify if the corresponding response is ready. The responses return the properties based on the request.
The following APIs help you to check if the response for the allocation request is ready.
Common Exceptions Raised by Java APIs for Errors
ResourceErrorException
: If the allocation has failed, the request does not exist, or the pool does not exist.
ConfException
: When there are format errors in the API request call.
IOException
: When the I/O operations fail or are interrupted.
The following API reads the allocated IP subnet from the resource pool once the allocation request response is ready.
This non-service IP address allocation API is created from Resource Manager 4.2.8.
Common Exceptions Raised by Java APIs for Errors
ResourceErrorException
: If the allocation has failed, the request does not exist, or the pool does not exist.
ResourceWaitException
: If the allocation is not ready.
The RM package exposes Python APIs to manage allocation for IP subnet from the resource pool.
Below is the list of Python APIs exposed by the RM package.
The RM package exposes Python APIs to manage non-service allocation for IP subnet from the resource pool. Below is the list of Python APIs exposed by the RM package.
RM package exposes APIs to manage ID allocation from the ID resource pool. The APIs are available to request ID, check if the allocation is ready and also to read the allocation once ready.
The following are the asynchronous old Java APIs for ID allocation from the RM resource pool.
The following API is used to create or update an ID allocation request with non-service.
Common Exceptions Raised by Java APIs for Errors
The API may throw the below exception if no pool resource exists for the requested allocation: ResourceErrorException
.
The API may throw the below exception if the ID request conflicts with another allocation or does not match the previous allocation in case of multiple owner requests: AllocationException
.
RM package exposes responseReady
Java API to verify if the ID allocation request is ready or not.
The following APIs are used to verify if the response is ready for an ID allocation request.
Common Exceptions Raised by Java APIs for Errors
The API may throw the below exception if no pool resource exists for the requested allocation: ResourceException
.
The API may throw the below exception when there are format errors in the API request call: ConfException
.
The API may throw the below exception when the I/O operations fail or are interrupted: IOException
.
The following API reads information about specific allocation requests made by the API call. The response returns the allocated ID from the ID pool.
The following are the synchronous/asynchronous new Java APIs exposed by the RM package for ID allocation from the resource pool.
Common Exceptions Raised by Java APIs for Errors
The API may throw the below exception if no pool resource exists for the requested allocation: ResourceErrorException
.
The API may throw the below exception if the ID request conflicts with another allocation or does not match the previous allocation in case of multiple owner requests: AllocationException
.
The RM package also exposed Python APIs to request ID allocation from a resource pool. The below APIs are Python APIs exposed by RM for ID allocation.
The RM package also exposes Python APIs to request ID allocation from a resource pool by passing the maapi object and transaction handle instead of the service. The below APIs are Python APIs for non-service ID allocation.
Use the module resource_manager.id_allocator
.
Set the Java Debug
Check the Log File
RM processing logs are in the file ncs-java-vm.log
. Here is the example RM API entry point msg called from the services:
Use the RM Action Tool