Develop services and more in NSO.
Loading...
Develop services and applications in NSO.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Useful information to help you get started with NSO development.
This section describes some recipes, tools, and other resources that you may find useful throughout development. The topics are tailored to novice users and focus on making development with NSO a more enjoyable experience.
Many developers prefer their own, dedicated NSO instance to avoid their work clashing with other team members. You can use either a local or remote Linux machine (such as a VM), or a macOS computer for this purpose.
The advantage of running local Linux with a GUI or macOS is that it is easier to set up the Integrated Development Environment (IDE) and other tools when they run on the same system as NSO. However, many IDEs today also allow working remotely, such as through the SSH protocol, making the choice of local versus remote less of a concern.
For development, using the so-called Local Install of NSO has some distinct advantages:
It does not require elevated privileges to install or run.
It keeps all NSO files in the same place (user-defined).
It allows you to quickly switch between projects and NSO versions.
If you work with multiple projects in parallel, local install also allows you to take advantage of Python virtual environments to separate Python packages per project; simply start the NSO instance in an environment you have activated.
The main downside of using a local install is that it differs slightly from a system (production) install, such as in the filesystem paths used and the out-of-the-box configuration.
See Local Install for installation instructions.
There are a number of examples and showcases in this guide. We encourage you to follow them through. They are also a great reference if you are experimenting with a new feature and have trouble getting it to work; you can inspect and compare with the implementation in the example.
To run the examples, you will need access to an NSO instance. A development instance described in this chapter is the perfect option for running locally. See Running NSO Examples.
Cisco also provides an online sandbox and containerized environments, such as a Learning Lab or NSO Sandbox, designed for this purpose. Refer to the NSO Docs Home site for additional resources.
Modern IDEs offer many features on top of advanced file editing support, such as code highlighting, syntax checks, and integrated debugging. While the initial setup takes some effort, the benefits of using an IDE are immense.
Visual Studio Code (VS Code) is a freely available and extensible IDE. You can add support for Java, Python, and YANG languages, as well as remote access through SSH via VS Code extensions. Consider installing the following extensions:
Python by Microsoft: Adds Python support.
Language Support for Java(TM) by Red Hat: Adds Java support.
NSO Developer Studio by Cisco: Adds NSO-specific features as described in NSO Developer Studio.
Remote - SSH by Microsoft: Adds support for remote development.
The Remote - SSH extension is especially useful when you must work with a system through an SSH session. Once you connect to the remote host by clicking the ><
button (typically found in the bottom-left corner of the VS Code window), you can open and edit remote files with ease. If you also want language support (syntax highlighting and alike), you may need to install VS Code extensions remotely. That is, install the extensions after you have connected to the remote host, otherwise the extension installation screen might not show the option for installation on the connected host.
You will also benefit greatly from setting up SSH certificate authentication if you are using an SSH session for your work.
Once you get familiar with NSO development and gain some experience, a single NSO instance is likely to be insufficient; either because you need instances for unit testing, because you need one-off (throwaway) instances for an experiment, or something else entirely.
NSO includes tooling to help you quickly set up new local instances when such a need arises.
The following recipe relies on the ncs-setup
command, which is available in the local install variant and requires a correctly set up shell environment (e.g. running source ncsrc
). See Local Install for details.
A new instance typically needs a few things to be useful:
Packages
Initial data
Devices to manage
In its simplest form, the ncs-setup
invocation requires only a destination directory. However, you can specify additional packages to use with the --package
option. Use the option to add as many packages as you need.
Running ncs-setup
creates the required filesystem structure for an NSO instance. If you wish to include initial configuration data, put the XML-encoded data in the ncs-cdb
subdirectory and NSO will load it at the first start, as described in Initialization Files.
NSO also needs to know about the managed devices. In case you are using ncs-netsim
simulated devices (described in Network Simulator), you can use the --netsim-dir
option with ncs-setup
to add them directly. Otherwise, you may need to create some initial XML files with the relevant device configuration data — much like how you would add a device to NSO manually.
Most of the time, you must also invoke a sync with the device so that it performs correctly with NSO. If you wish to push some initial configuration to the device, you may add the configuration in the form of initial XML data and perform a sync-to
. Alternatively, you can simply do a sync-from
. You can use the ncs_cmd
command for this purpose.
Combining all of this together, consider the following example:
Start by creating a new directory to hold the files:
Create and start a few simulated devices with ncs-netsim
, using ./netsim
as directory:
Next, create the running directory with the NED package for the simulated devices and one more package. Also, add configuration data to NSO on how to connect to these simulated devices.
Now you can add custom initial data as XML files to ncs-run/ncs-cdb/
. Usually, you would use existing files but you can also create them on-the-fly.
At this point, you are ready to start NSO:
Finally, request an initial sync-from
:
The instance is now ready for work. Once you are finished, you can stop it with ncs --stop
. Remember to also stop the simulated devices with ncs-netsim stop
if you no longer need them. Then, delete the containing folder (nso-throwaway
) to remove all the leftover files and data.
Develop NSO services using Visual Studio (VS) Code extensions.
NSO Developer Studio provides an integrated framework for developing NSO services using Visual Studio (VS) Code extensions. The extensions come with a core feature set to help you create services and connect to running CDB instances from within the VS Code environment. The following extensions are available as part of the NSO Developer Studio:
NSO Developer Studio - Developer: Used for creating NSO services. Also referred to as NSO Developer extension in this guide.
NSO Developer Studio - Explorer: Used for connecting to and inspecting NSO instance. Also referred to as NSO Explorer extension in this guide.
Throughout this guide, references to the VS Code GUI elements are made. It is recommended that you understand the GUI terminology before proceeding. To familiarize yourself with the VS Code GUI terminology, refer to VS Code UX Guidelines.
CodeLens is a VS Code feature to facilitate performing inline contextual actions. See Extensions using CodeLens for more information.
Contribute
If you feel certain code snippets would be helpful or would like to help contribute to enhancing the extension, please get in touch: jwycoff@cisco.com.
This section describes the installation and functionality of the NSO Developer extension.
The purpose of the NSO Developer extension is to provide a base framework for developers to create their own NSO services. The focus of this guide is to manifest the creation of a simple NSO service package using the NSO Developer extension. At this time, reactive FastMAP and Nano services are not supported with this extension.
In terms of an NSO package, the extension supports YANG, XML, and Python to bring together various elements required to create a simple service.
After the installation, you can use the extension to create services and perform additional functions described below.
To get started with development using the NSO Developer extension, ensure that the following prerequisites are met on your system. The prerequisites are not a requirement to install the NSO Developer extension, but for NSO development after the extension is installed.
Visual Studio Code.
Java JDK 11 or higher.
Python 3.9 or higher (recommended).
Installation of the NSO Developer extension is done via the VS Code marketplace.
To install the NSO Developer extension in your VS Code environment:
Open VS Code and click the Extensions icon on the Activity Bar.
Search for the extension using the keywords "nso developer studio" in the Search Extensions in Marketplace field.
In the search results, locate the extension (NSO Developer Studio - Developer) and click Install.
Wait while the installation completes. A notification at the bottom-right corner indicates that the installation has finished. After the installation, an NSO icon is added to the Activity Bar.
Use the Make Package command in VS Code to create a new Python package. The purpose of this command is to provide functionality similar to the ncs-make-package
CLI command, that is, to create a basic structure for you to start developing a new Python service package. The ncs-make-package
command, however, comes with several additional options to create a package.
To make a new Python service package:
In the VS Code menu, go to View, and choose Command Palette.
In the Command Palette, type or pick the command NSO: Make Package. This brings up the Make Package dialog where you can configure package details.
In the Make Package dialog, specify the following package details:
Package Name: Name of the package.
Package Location: Destination folder where the package is to be created.
Namespace: Namespace of the YANG module, e.g. http://www.cisco.com/myModule
.
Prefix: The prefix to be given to the YANG module, e.g. msp
.
Yang Version: The YANG version that this module follows.
Click Create Package. This creates the required package and opens up a new instance of VS Code with the newly created NSO package.
If the Workspace Trust dialog is shown, click Yes, I Trust the Authors.
Use the Open Existing Package command to open an already existing package.
To open an existing package:
In the VS Code menu, go to View, then choose Command Palette.
In the Command Palette, type or pick the command NSO: Open Existing Package.
Browse for the package on your local disk and open it. This brings up a new instance of VS Code and opens the package in it.
Opening a YANG file for edit results in VS Code detecting syntax errors in the YANG file. The errors show up due to missing path to YANG files and can be resolved using the following procedure.
Add YANG models for Yangster
For YANG support, a third-party extension called Yangster is used. Yangster is able to resolve imports for core NSO models but requires additional configuration.
To add YANG models for Yangster:
Create a new file named yang.settings
by right-clicking in the blank area of the Explorer view and choosing New File from the pop-up.
Locate the NSO source YANG files on your local disk and copy the path.
In the file yang.settings
, enter the path in the JSON format: { "yangPath": "<path to Yang files >" }
, for example, { "yangPath": /home/my-user-name/nso-6.0/src/ncs/yang}
. On Microsoft Windows, make sure that the backslash (\
) is escaped, e.g., "C:\\user\\folder\\src\\yang
".
Save the file.
Wait while the Yangster extension indexes and parses the YANG file to resolve NSO imports. After the parsing is finished, errors in the YANG file will disappear.
YANG diagram is a feature provided by the Yangster extension.
To view the YANG diagram:
Update the YANG file. (Pressing Ctrl+space brings up auto-completion where applicable.)
Right-click anywhere in the VS Code Editor area and select Open in Diagram in the pop-up.
To add a new YANG module:
In the Explorer view, navigate to the yang folder and select it.
Right-click on the yang folder and select NSO: Add Yang Module from the pop-up menu. This brings up the Create Yang Module dialog where you can configure module details.
In the Create Yang Module dialog, fill in the following details:
Module Name: Name of the module.
Namespace: Namespace of the module, e.g., http://www.cisco.com/myModule
.
Prefix: Prefix for the YANG module.
Yang Version: Version of YANG for this module.
Click Finish. This creates and opens up the newly created module.
Often while working on a package, there is a requirement to create a new service. This usually involves adding a service point. Adding a service point also requires other parts of the files to be updated, for example, Python.
Service points are usually added to lists.
To add a service point:
Update your YANG model as required. The extension automatically detects the list elements and displays a CodeLens called Add Service Point. An example is shown below.
Click the Add Service Point CodeLens. This brings up the Add Service Point dialog.
Fill in the Service Point ID that is used to identify the service point, for example, mySimpleService
.
Next, in the Python Details section, select using the Python Module field if you want to create a new Python module or use an existing one.
If you opt to create a new Python file, relevant sections are automatically updated in package-meta-data.xml
.
If you select an existing Python module from the list, it is assumed that you are selecting the correct module and that, it has been created correctly, i.e., the package-meta-data.xml
file is updated with the component definition.
Enter the Service CB Class, for example, SimpleServiceCB
.
Finish creating the service by clicking Add Service Point.
All action points in a YANG model must be registered in NSO. Registering an action point also requires other parts of the files to be updated, for example, Python (register_action
), and update package-meta-data
if needed.
Action points are usually defined to lists or containers.
To register an action point:
Update your YANG model as required. The extension automatically detects the action point elements in YANG and displays a CodeLens called Add Action Point. An example is shown below.
Note that it is mandatory to specify tailf:actionpoint <actionpointname>
under tailf:action <actionname>
. This is a known limitation.
The action point CodeLens at this time only works for the tailf:action
statement, and not for the YANG rpc
or YANG 1.1 action
statements.
Click the Add Action Point CodeLens. This brings up the Register Action Point dialog.
Next, in the Python Details section, select using the Python Module field if you want to create a new Python module or use an existing one.
If you opt to create a new Python file, relevant sections are automatically updated in package-meta-data.xml
.
If you select an existing Python module from the list, it is assumed that you are selecting the correct module, and that it has been created correctly, i.e., the package-meta-data.xml
file is updated with the component definition.
Enter the action class name in the Main Class name used as entry point field, for example, MyAction
.
Finish by clicking Register Action Point.
Opening a Python file uses the Microsoft Pylance extension. This extension provides syntax highlighting and other features such as code completion.
To resolve NCS import errors with the Pylance extension, you need to configure the path to NSO Python API in VS Code settings. To do this, go to VS Code Preferences > Settings and type python.analysis.extraPaths
in the Search settings field. Next, click Add Item, and enter the path to NSO Python API, for example, /home/my-user-name/nso-6.0/src/ncs/pyapi
. Press OK when done.
To add a new Python module:
In the Primary Sidebar, Explorer view, right-click on the python
folder.
Select NSO: Add Python Module from the pop-up. This brings up the Create Python Module dialog.
In the Create Python Module dialog, fill in the following details:
Module Name: Name of the module, for example, MyServicePackage.service
.
Component Name: Name of the component that will be used to identify this module, for example, service
.
Class Name: Name of the class to be invoked, for example, Main
.
Click Finish.
Pre-defined snippets in VS Code allow for NSO Python code completion.
To use a Python code completion snippet:
Open a Python file for editing.
Type in one of the following pre-defined texts to display snippet options:
maapi
: to view options for creating a maapi
write transaction.
ncs
: to view options for snippet for ncs
template and variables.
Select a snippet from the pop-up to insert its code. This also highlights config items that can be changed. Press the Tab key to cycle through each value.
The final part of a typical service development is creating and editing the XML configuration template.
Add a New XML Template
To add a new XML template:
In the Primary Sidebar, Explorer view, right-click on the templates folder.
Select NSO: Add XML Template from the pop-up. This brings up the Add XML Template dialog.
In the Add XML Template dialog, fill in the XML Template name, for example, mspSimpleService
.
Click Finish.
Use XML Code Completion Snippets
Pre-defined snippets in VS Code allow for NSO XML code completion of processing instructions and variables.
To use an XML code completion snippet:
Open an XML file for editing.
Type in one of the following pre-defined texts to display snippet options:
For processing instructions: <?
followed by a character, for example <?i
to view snippets for an if
statement. All supported processing instructions are available as snippets.
For variables: $
followed by a character(s) matching the variable name, for example, $VA
to view the variable snippet. Variables defined in the XML template via the <?set
processing instruction or defined in Python code are displayed.
Note: Auto-completion can also be triggered by pressing the Ctrl+Space keys.
Select an option from the pop-up to insert the relevant XML processing instruction or variable. Items that require further configuration are highlighted. Press the Tab key to cycle through the items.
XML Code Validation
The NSO Developer extension also performs code validation wherever possible. The following warning and error messages are shown if the extension is unable to validate the code:
A warning is shown if a user enters a variable in an XML template that is not detected by the NSO Developer extension.
An error message is shown if the ending tags in a processing instruction do not match.
The extension provides help on a best-effort basis by showing error messages and warnings wherever possible. Still, in certain situations, code validation is not possible. An example of such a limitation is when the extension is not able to detect a template variable that is defined elsewhere and passed indirectly (i.e., the variable is not directly called).
Consider the following code for example, where the extension will successfully detect that a template variable IP_ADDRESS
has been set.
vars.add('IP_ADDRESS','192.168.0.1')
Now consider the following code. While it serves the same purpose, it will fail to be detected.
ip_add_var_name = 'IP_ADDRESS' vars.add(ip_add_var_name, '192.168.0.1')
This section describes the installation and functionality of the NSO Explorer extension.
The purpose of the NSO Explorer extension is to allow the user to connect to a running instance of NSO and navigate the CDB from within VS Code.
To get started with the NSO Explorer extension, ensure that the following prerequisites are met on your system. The prerequisites are not a requirement to install the NSO Explorer extension, but for NSO development after the extension is installed.
Visual Studio Code.
Java JDK 11 or higher.
Python 3.9 or higher (recommended).
Installation of the NSO Explorer extension is done via the VS Code marketplace.
To install the NSO Explorer extension in your VS Code environment:
Open VS Code and click the Extensions icon on the Activity Bar.
Search for the extension using the keywords "nso developer studio" in the Search Extensions in Marketplace field.
In the search results, locate the extension (NSO Developer Studio - Explorer) and click Install.
Wait while the installation completes. A notification at the bottom-right corner indicates that the installation has finished. After the installation, an NSO icon is added to the Activity Bar.
The NSO Explorer extension allows you to connect to and inspect a live NSO instance from within the VS Code. This procedure assumes that you have not previously connected to an NSO instance.
To connect to an NSO instance:
In the Activity Bar, click the NSO icon to open NSO Explorer.
If no NSO instance is already configured, a welcome screen is displayed with an option to add a new NSO instance.
Click the Add NSO Instance button to open the Settings editor.
In the Settings editor, click the link Edit in settings.json. This opens the settings.json
file for editing.
Next, edit the settings.json
file as shown below:
Save the file when done.
If settings have been configured correctly, NSO Explorer will attempt to connect to the running NSO instance and display the NSO configuration.
Once the NSO Explorer extension is configured, the user can inspect the CDB tree.
To inspect the CDB tree, use the following functions:
Get Element Info: Click the i (info) icon on the Explorer bar, or alternatively inline next to an element in the Explorer view.
Copy KeyPath: Click the {KP}
icon to copy the keypath for the selected node.
Copy XPath: Click the {XP}
icon to copy the XPath for the selected node.
Get XML Config: Click the XML
icon to retrieve the XML configuration for the selected node and copy it to the clipboard.
If data has changed in NSO, click the refresh button at the top of the Explorer pane to fetch it.
Perform NED version upgrades and migration.
Many services in NSO rely on NEDs to perform network provisioning. These services map service-specific configuration to the device data models, provided by the NEDs. As the NED packages can be upgraded independently, they can introduce changes in the device YANG models that cause issues for the services using them.
NSO provides tools to migrate between backward incompatible NED versions. The tools are designed to give you a structured analysis of which paths will change between two NED versions and visibility into the scope of the potential impact that a change in the NED will drive in the service code.
The tools allow for a usage-based analysis of which parts of the NED data model (and instance tree) a particular service has written to. This will give you an (at least opportunistic) sense of which paths must change in the service code.
These features aim to lower the barrier of upgrading NEDs and significantly reduce the amount of uncertainty and side effects that NED upgrades were historically associated with.
migrate
ActionBy using the /ncs:devices/device/migrate
action, you can change the NED major/minor version of a device. The action migrates all configuration and service meta-data. The action can also be executed in parallel on a device group or on all devices matching a NED identity. The procedure for migrating devices is further described in NED Migration.
Additionally, the example examples.ncs/getting-started/developing-with-ncs/26-ned-migration
in the NSO examples collection illustrates how to migrate devices between different NED versions using the migrate
action.
What makes it particularly useful to a service developer is that the action reports what paths have been modified and the service instances affected by those changes. This information can then be used to prepare the service code to handle the new NED version. If the verbose
option is used, all service instances are reported instead of just the service points. If the dry-run
option is used, the action simply reports what it would do. This gives you the chance to analyze before any actual change is performed.
Deep dive into service implementation.
Before you Proceed
This section discusses the implementation details of services in NSO. The reader should already be familiar with the concepts described in the introductory sections and Implementing Services.
For an introduction to services, see Develop a Simple Service instead.
Each service type in NSO extends a part of the data model (a list or a container) with the ncs:servicepoint
statement and the ncs:service-data
grouping. This is what defines an NSO service.
The service point instructs NSO to involve the service machinery (Service Manager) for management of that part of the data tree and the ncs:service-data
grouping contains definitions common to all services in NSO. Defined in tailf-ncs-services.yang
, ncs:service-data
includes parts that are required for the proper operation of FASTMAP and the Service Manager. Every service must therefore use this grouping as part of its data model.
In addition, ncs:service-data
provides a common service interface to the users, consisting of:
While not part of ncs:service-data
as such, you may consider the service-commit-queue-event
notification part of the core service interface. The notification provides information about the state of the service when the service uses the commit queue. As an example, an event-driven application uses this notification to find out when a service instance has been deployed to the devices. See the showcase_rc.py
script in examples.ncs/development-guide/concurrency-model/perf-stack/
for sample Python code, leveraging the notification. See tailf-ncs-services.yang
for the full definition of the notification.
NSO Service Manager is responsible for providing the functionality of the common service interface, requiring no additional user code. This interface is the same for classic and nano services, whereas nano services further extend the model.
NSO calls into Service Manager when accessing actions and operational data under the common service interface, or when the service instance configuration data (the data under the service point) changes. NSO being a transactional system, configuration data changes happen in a transaction.
When applied, a transaction goes through multiple stages, as shown by the progress trace (e.g. using commit | details
in the CLI). The detailed output breaks up the transaction into four distinct phases:
validation
write-start
prepare
commit
These phases deal with how the network-wide transactions work:
The validation phase prepares and validates the new configuration (including NSO copy of device configurations), then the CDB processes the changes and prepares them for local storage in the write-start phase.
The prepare stage sends out the changes to the network through the Device Manager and the HA system. The changes are staged (e.g. in the candidate data store) and validated if the device supports it, otherwise, the changes are activated immediately.
If all systems took the new configuration successfully, enter the commit phase, marking the new NSO configuration as active and activating or committing the staged configuration on remote devices. Otherwise, enter the abort phase, discarding changes, and ask NEDs to revert activated changes on devices that do not support transactions (e.g. without candidate data store).
There are also two types of locks involved with the transaction that are of interest to the service developer; the service write lock and the transaction lock. The latter is a global lock, required to serialize transactions, while the former is a per-service-type lock for serializing services that cannot be run in parallel. See Scaling and Performance Optimization for more details and their impact on performance.
The first phase, historically called validation, does more than just validate data and is the phase a service deals with the most. The other three support the NSO service framework but a service developer rarely interacts with directly.
We can further break down the first phase into the following stages:
rollback creation
pre-transform validation
transforms
full data validation
conflict check and transaction lock
When the transaction starts applying, NSO captures the initial intent and creates a rollback file, which allows one to reverse or roll back the intent. For example, the rollback file might contain the information that you changed a service instance parameter but it would not contain the service-produced device changes.
Then the first, partial validation takes place. It ensures the service input parameters are valid according to the service YANG model, so the service code can safely use provided parameter values.
Next, NSO runs transaction hooks and performs the necessary transforms, which alter the data before it is saved, for example encrypting passwords. This is also where the Service Manager invokes FASTMAP and service mapping callbacks, recording the resulting changes. NSO takes service write locks in this stage, too.
After transforms, there are no more changes to the configuration data, and the full validation starts, including YANG model constraints over the complete configuration, custom validation through validation points, and configuration policies (see Policies in Operation and Usage).
Throughout the phase, the transaction engine makes checkpoints, so it can restart the transaction faster in case of concurrency conflicts. The check for conflicts happens at the end of this first phase when NSO also takes the global transaction lock. Concurrency is further discussed in NSO Concurrency Model.
The main callback associated with a service point is the create callback, designed to produce the required (new) configuration, while FASTMAP takes care of the other operations, such as update and delete.
NSO implements two additional, optional callbacks for scenarios where create is insufficient. These are pre- and post-modification callbacks that NSO invokes before (pre) or after (post) create. These callbacks work outside of the scope tracked by FASTMAP. That is, changes done in pre- and post-modification do not automatically get removed during the update or delete of the service instance.
For example, you can use the pre-modification callback to check the service prerequisites (pre-check) or make changes that you want persisted even after the service is removed, such as enabling some global device feature. The latter may be required when NSO is not the only system managing the device and removing the feature configuration would break non-NSO managed services.
Similarly, you might use post-modification to reset the configuration to some default after the service is removed. Say the service configures an interface on a router for customer VPN. However, when the service is deprovisioned (removed), you don't want to simply erase the interface configuration. Instead, you want to put it in shutdown and configure it for a special, unused VLAN. The post-modification callback allows you to achieve this goal.
The main difference from create callback is that pre- and post-modification are called on update and delete, as well as service create. Since the service data node may no longer exist in case of delete, the API for these callbacks does not supply the service
object. Instead, the callback receives the operation and key path to the service instance. See the following API signatures for details.
The Python callbacks use the following function arguments:
tctx
: A TransCtxRef object containing transaction data, such as user session and transaction handle information.
op
: Integer representing operation: create (ncs.dp.NCS_SERVICE_CREATE
), update (ncs.dp.NCS_SERVICE_UPDATE
), or delete (ncs.dp.NCS_SERVICE_DELETE
) of the service instance.
kp
: A HKeypathRef object with a key path of the affected service instance, such as /svc:my-service{instance1}
.
root
: A Maagic node for the root of the data model.
service
: A Maagic node for the service instance.
proplist
: Opaque service properties, see Persistent Opaque Data.
The Java callbacks use the following function arguments:
context
: A ServiceContext object for accessing root and service instance NavuNode in the current transaction.
operation
: ServiceOperationType enum representing operation: CREATE
, UPDATE
, DELETE
of the service instance.
path
: A ConfPath object with a key path of the affected service instance, such as /svc:my-service{instance1}
.
ncsRoot
: A NavuNode for the root of the ncs
data model.
service
: A NavuNode for the service instance.
opaque
: Opaque service properties, see Persistent Opaque Data.
See examples.ncs/development-guide/services/post-modification-py
and examples.ncs/development-guide/services/post-modification-java
examples for a sample implementation of the post-modification callback.
Additionally, you may implement these callbacks with templates. Refer to Service Callpoints and Templates for details.
FASTMAP greatly simplifies service code, so it usually only needs to deal with the initial mapping. NSO achieves this by first discarding all the configuration performed during the create callback of the previous run. In other words, the service create code always starts anew, with a blank slate.
If you need to keep some private service data across runs of the create callback, or pass data between callbacks, such as pre- and post-modification, you can use opaque properties.
The opaque object is available in the service callbacks as an argument, typically named proplist
(Python) or opaque
(Java). It contains a set of named properties with their corresponding values.
If you wish to use the opaque properties, it is crucial that your code returns the properties object from the create call, otherwise, the service machinery will not save the new version.
Compared to pre- and post-modification callbacks, which also persist data outside of FASTMAP, NSO deletes the opaque data when the service instance is deleted, unlike with the pre- and post-modification data.
The examples.ncs/development-guide/services/post-modification-py
and examples.ncs/development-guide/services/post-modification-java
examples showcase the use of opaque properties.
NSO by default enables concurrent scheduling and execution of services to maximize throughput. However, concurrent execution can be problematic for non-thread-safe services or services that are known to always conflict with themselves or other services, such as when they read and write the same shared data. See NSO Concurrency Model for details.
To prevent NSO from scheduling a service instance together with an instance of another service, declare a static conflict in the service model, using the ncs:conflicts-with
extension. The following example shows a service with two declared static conflicts, one with itself and one with another service, named other-service
.
This means each service instance will wait for other service instances that have started sooner than this one (and are of example-service or other-service type) to finish before proceeding.
FASTMAP knows that a particular piece of configuration belongs to a service instance, allowing NSO to revert the change as needed. But what happens when several service instances share a resource that may or may not exist before the first service instance is created? If the service implementation naively checks for existence and creates the resource when it is missing, then the resource will be tracked with the first service instance only. If, later on, this first instance is removed, then the shared resource is also removed, affecting all other instances.
A well-known solution to this kind of problem is reference counting. NSO uses reference counting by default with the XML templates and Python Maagic API, while in Java Maapi and Navu APIs, the sharedCreate()
, sharedSet()
, and sharedSetValues()
functions need to be used.
When enabled, the reference counter allows FASTMAP algorithm to keep track of the usage and only delete data when the last service instance referring to this data is removed.
Furthermore, containers and list items created using the sharedCreate()
and sharedSetValues()
functions also get an additional attribute called backpointer
. (But this functionality is currently not available for individual leafs.)
backpointer
points back to the service instance that created the entity in the first place. This makes it possible to look at part of the configuration, say under /devices
tree, and answer the question: which parts of the device configuration were created by which service?
To see reference counting in action, start the examples.ncs/implement-a-service/iface-v3
example with make demo
and configure a service instance.
Then configure another service instance with the same parameters and use the display service-meta-data
pipe to show the reference counts and backpointers:
Notice how commit dry-run
produces no new device configuration but the system still tracks the changes. If you wish, remove the first instance and verify the GigabitEthernet 0/1
configuration is still there, but is gone when you also remove the second one.
But what happens if the two services produce different configurations for the same node? Say, one sets the IP address to 10.1.2.3
and the other to 10.1.2.4
. Conceptually, these two services are incompatible, and instantiating both at the same time produces a broken configuration (instantiating the second service instance breaks the configuration for the first). What is worse is that the current configuration depends on the order the services were deployed or re-deployed. For example, re-deploying the first service will change the configuration from 10.1.2.4
back to 10.1.2.3
and vice versa. Such inconsistencies break the declarative configuration model and really should be avoided.
In practice, however, NSO does not prevent services from producing such configuration. But note that we strongly recommend against it and that there are associated limitations, such as service un-deploy not reverting configuration to that produced by the other instance (but when all services are removed, the original configuration is still restored).
The commit | debug
service pipe command warns about any such conflict that it finds but may miss conflicts on individual leafs. The best practice is to use integration tests in the service development life cycle to ensure there are no conflicts, especially when multiple teams develop their own set of services that are to be deployed on the same NSO instance.
Much like a service in NSO can provision device configurations, it can also provision other, non-device data, as well as other services. We call the approach of services provisioning other services 'service stacking' and the services that are involved — 'stacked'.
Service stacking concepts usually come into play for bigger, more complex services. There are a number of reasons why you might prefer stacked services to a single monolithic one:
Smaller, more manageable services with simpler logic.
Separation of concerns and responsibility.
Clearer ownership across teams for (parts of) overall service.
Smaller services reusable as components across the solution.
Avoiding overlapping configuration between service instances causing conflicts, such as using one service instance per device (see examples in Designing for Maximal Transaction Throughput).
Stacked services are also the basis for LSA, which takes this concept even further. See Layered Service Architecture for details.
The standard naming convention with stacked services distinguishes between a Resource-Facing Service (RFS), that directly configures one or more devices, and a Customer-Facing Service (CFS), that is the top-level service, configuring only other services, not devices. There can be more than two layers of services in the stack, too.
While NSO does not prevent a single service from configuring devices as well as services, in the majority of cases this results in a less clean design and is best avoided.
Overall, creating stacked services is very similar to the non-stacked approach. First, you can design the RFS services as usual. Actually, you might take existing services and reuse those. These then become your lower-level services, since they are lower in the stack.
Then you create a higher-level service, say a CFS, that configures another service, or a few, instead of a device. You can even use a template-only service to do that, such as:
The preceding example references an existing iface
service, such as the one in the examples.ncs/implement-a-service/iface-v3
example. The output shows hard-coded values but you can change those as you would for any other service.
In practice, you might find it beneficial to modularize your data model and potentially reuse parts in both, the lower- and higher-level service. This avoids duplication while still allowing you to directly expose some of the lower-level service functionality through the higher-level model.
The most important principle to keep in mind is that the data created by any service is owned by that service, regardless of how the mapping is done (through code or templates). If the user deletes a service instance, FASTMAP will automatically delete whatever the service created, including any other services. Likewise, if the operator directly manipulates service data that is created by another service, the higher-level service becomes out of sync. The check-sync service action checks this for services as well as devices.
In stacked service design, the lower-level service data is under the control of the higher-level service and must not be directly manipulated. Only the higher-level service may manipulate that data. However, two higher-level services may manipulate the same structures, since NSO performs reference counting (see Reference Counting Overlapping Configuration).
This section lists some specific advice for implementing services, as well as any known limitations you might run into.
You may also obtain some useful information by using the debug service
commit pipe command, such as commit dry-run | debug service
. The command display the net effect of the service create code, as well as issue warnings about potentially problematic usage of overlapping shared data.
Service callbacks must be deterministic: NSO invokes service callbacks in a number of situations, such as for dry-run, check sync, and actual provisioning. If a service does not create the same configuration from the same inputs, NSO sees it as being out of sync, resulting in a lot of configuration churn and making it incompatible with many NSO features. If you need to introduce some randomness or rely on some other nondeterministic source of data, make sure to cache the values across callback invocations, such as by using opaque properties (see Persistent Opaque Data) or persistent operational data (see Operational Data) populated in a pre-modification callback.
Never overwrite service inputs: Service input parameters capture client intent and a service should never change its own configuration. Such behavior not only muddles the intent but is also temporary when done in the create callback, as the changes are reverted on the next invocation.
If you need to keep some additional data that cannot be easily computed each time, consider using opaque properties (see Persistent Opaque Data) or persistent operational data (see Operational Data) populated in a pre-modification callback.
No service ordering in a transaction: NSO is a transactional system and as such does not have the concept of order inside a single transaction. That means NSO does not guarantee any specific order in which the service mapping code executes if the same transaction touches multiple service instances. Likewise, your code should not make any assumptions about running before or after other service code.
Return value of create callback: The create callback is not the exclusive user of the opaque object; the object can be chained in several different callbacks, such as pre- and post-modification. Therefore, returning None/null
from create callback is not a good practice. Instead, always return the opaque object even if the create callback does not use it.
Avoid delete in service create: Unlike creation, deleting configuration does not support reference counting, as there is no data left to reference count. This means the deleted elements are tied to the service instance that deleted them.
Additionally, FASTMAP must store the entire deleted tree and restore it on every service change or re-deploy, only to be deleted again. Depending on the amount of deleted data, this is potentially an expensive operation.
So, a general rule of thumb is to never use delete in service create code. If an explicit delete is used, debug service
may display the following warning:\
However, the service may also delete data implicitly, through when
and choice
statements in the YANG data model. If a when
statement evaluates to false, the configuration tree below that node is deleted. Likewise, if a case
is set in a choice
statement, the previously set case
is deleted. This has the same limitations as an explicit delete.
To avoid these issues, create a separate service, that only handles deletion, and use it in the main service through the stacked service design (see Stacked Services). This approach allows you to reference count the deletion operation and contains the effect of restoring deleted data through a small, rarely-changing helper service. See examples.ncs/development-guide/services/shared-delete
for an example.
Alternatively, you might consider pre- and post-modification callbacks for some specific cases.
Prefer shared*()
functions: Non-shared create and set operations in the Java and Python low-level API do not add reference counts or backpointer information to changed elements. In case there is overlap with another service, unwanted removal can occur. See Reference Counting Overlapping Configuration for details.
In general, you should prefer sharedCreate()
, sharedSet()
, and sharedSetValues()
. If non-shared variants are used in a shared context, service debug
displays a warning, such as:\
Likewise, do not use MAAPI load_config
variants from the service code. Use the sharedSetValues()
function to load XML data from a file or a string.
Reordering ordered-by-user lists: If the service code rearranges an ordered-by-user list with items that were created by another service, that other service becomes out of sync. In some cases, you might be able to avoid out-of-sync scenarios by leveraging special XML template syntax (see Operations on ordered lists and leaf-lists) or using service stacking with a helper service.
In general, however, you should reconsider your design and try to avoid such scenarios.
Automatic upgrade of keys for existing services is unsupported: Service backpointers, described in Reference Counting Overlapping Configuration, rely on the keys that the service model defines to identify individual service instances. If you update the model by adding, removing, or changing the type of leafs used in the service list key, while there are deployed service instances, the backpointers will not be automatically updated. Therefore, it is best to not change the service list key.
A workaround, if the service key absolutely must change, is to first perform a no-networking undeploy of the affected service instances, then upgrade the model, and finally no-networking re-deploy the previously un-deployed services.
Avoid conflicting intents: Consider that a service is executed as part of a transaction. If, in the same transaction, the service gets conflicting intents, for example, it gets modified and deleted, the transaction is aborted. You must decide which intent has higher priority and design your services to avoid such situations.
A very common situation, when NSO is deployed in an existing network, is that the network already has services implemented. These services may have been deployed manually or through an older provisioning system. To take full advantage of the new system, you should consider importing the existing services into NSO. The goal is to use NSO to manage existing service instances, along with adding new ones in the future.
The process of identifying services and importing them into NSO is called Service Discovery and can be broken down into the following high-level parts:
Implementing the service to match existing device configuration.
Enumerating service instances and their parameters.
Amend the service metadata references with reconciliation.
Ultimately, the problem that service discovery addresses is one of referencing or linking configuration to services. Since the network already contains target configuration, a new service instance in NSO produces no changes in the network. This means the new service in NSO by default does not own the network configuration. One side effect is that removing a service will not remove the corresponding device configuration, which is likely to interfere with service modification as well.
Some of the steps in the process can be automated, while others are mostly manual. The amount of work differs a lot depending on how structured and consistent the original deployment is.
A prerequisite (or possibly the product in an iterative approach) is an NSO service that supports all the different variants of the configuration for the service that are used in the network. This usually means there will be a few additional parameters in the service model that allow selecting the variant of device configuration produced, as well as some covering other non-standard configurations (if such configuration is present).
In the simplest case, there is only one variant and that is the one that the service needs to produce. Let's take the examples.ncs/implement-a-service/iface-v2-py
example and consider what happens when a device already has an existing interface configuration.
Configuring a new service instance does not produce any new device configuration (notice that device c1 has no changes).
However, when committed, NSO records the changes, just like in the case of overlapping configuration (see Reference Counting Overlapping Configuration). The main difference is that there is only a single backpointer, to a newly configured service, but the refcount
is 2. The other item, that contributes to the refcount
, is the original device configuration. Which is why the configuration is not deleted when the service instance is.
A prerequisite for service discovery to work is that it is possible to construct a list of the already existing services. Such a list may exist in an inventory system, an external database, or perhaps just an Excel spreadsheet.
You can import the list of services in a number of ways. If you are reading it in from a spreadsheet, a Python script using NSO API directly (Basic Automation with Python) and a module to read Excel files is likely a good choice.
Or, you might generate an XML data file to import using the ncs_load
command; use display xml
filter to help you create a template:
Regardless of the way you implement the data import, you can run into two kinds of problems.
On one hand, the service list data may be incomplete. Suppose that the earliest service instances deployed did not take the network mask as a parameter. Moreover, for some specific reasons, a number of interfaces had to deviate from the default of 28 and that information was never populated back in the inventory for old services after the netmask parameter was added.
Now the only place where that information is still kept may be the actual device configuration. Fortunately, you can access it through NSO, which may allow you to extract the missing data automatically, for example:
On the other hand, some parameters may be NSO specific, such as those controlling which variant of configuration to produce. Again, you might be able to use a script to find this information, or it could turn out that the configuration is too complex to make such a script feasible.
In general, this can be the most tricky part of the service discovery process, making it very hard to automate. It all comes down to how good the existing data is. Keep in mind that this exercise is typically also a cleanup exercise, and every network will be different.
The last step is updating the metadata, telling NSO that a given service controls (owns) the device configuration that was already present when the NSO service was configured. This is called reconciliation and you achieve it using a special re-deploy reconcile
action for the service.
Let's examine the effects of this action on the following data:
Having run the action, NSO has updated the refcount
to remove the reference to the original device configuration:
What is more, the reconcile algorithm works even if multiple service instances share configuration. What if you had two instances of the iface
service, instead of one?
Before reconciliation, the device configuration would show a refcount of three.
Invoking re-deploy reconcile
on either one or both of the instances makes the services sole owners of the configuration.
This means the device configuration is removed only when you remove both service instances.
The reconcile operation only removes the references to the original configuration (without the service backpointer), so you can execute it as many times as you wish. Just note that it is part of a service re-deploy, with all the implications that brings, such as potentially deploying new configuration to devices when you change the service template.
As an alternative to the re-deploy reconcile
, you can initially add the service configuration with a commit reconcile
variant, performing reconciliation right away.
It is hard to design a service in one go when you wish to cover existing configurations that are exceedingly complex or have a lot of variance. In such cases, many prefer an iterative approach, where you tackle the problem piece-by-piece.
Suppose there are two variants of the service configured in the network; iface-v2-py
and the newer iface-v3
, which produces a slightly different configuration. This is a typical scenario when a different (non-NSO) automation system is used and the service gradually evolves over time. Or, when a Method of Procedure (MOP) is updated if manual provisioning is used.
We will tackle this scenario to show how you might perform service discovery in an iterative fashion. We shall start with the iface-v2-py
as the first iteration of the iface
service, which represents what configuration the service should produce to the best of our current knowledge.
There are configurations for two service instances in the network already: for interfaces 0/1 and 0/2 on the c1
device. So, configure the two corresponding iface
instances.
You can also use the commit no-deploy
variant to add service parameters when a normal commit would produce device changes, which you do not want.
Then use the re-deploy reconcile { discard-non-service-config } dry-run
command to observe the difference between the service-produced configuration and the one present in the network.
For instance1
, the config is the same, so you can safely reconcile it already.
But interface 0/2 (instance2
), which you suspect was initially provisioned with the newer version of the service, produces the following:
The output tells you that the service is missing the ip dhcp snooping trust
part of the interface configuration. Since the service does not generate this part of the configuration yet, running re-deploy reconcile { discard-non-service-config }
(without dry-run) would remove the DHCP trust setting. This is not what we want.
One option, and this is the default reconcile mode, would be to use keep-non-service-config
instead of discard-non-service-config
. But that would result in the service taking ownership of only part of the interface configuration (the IP address).
Instead, the right approach is to add the missing part to the service template. There is, however, a little problem. Adding the DHCP snooping trust configuration unconditionally to the template can interfere with the other service instance, instance1
.
In some cases, upgrading the old configuration to the new variant is viable, but in most situations, you likely want to avoid all device configuration changes. For the latter case, you need to add another parameter to the service model that selects the configuration variant. You must update the template too, producing the second iteration of the service.
With the updated configuration, you can now safely reconcile the service2
service instance:
Nevertheless, keep in mind that the discard-non-service-config reconcile operation only considers parts of the device configuration under nodes that are created with the service mapping. Even if all data there is covered in the mapping, there could still be other parts that belong to the service but reside in an entirely different section of the device configuration (say DNS configuration under ip name-server
, which is outside the interface GigabitEthernet
part) or even a different device. That kind of configuration the discard-non-service-config
option cannot find on its own and you must add manually.
You can find the complete iface
service as part of the examples.ncs/development-guide/services/discovery
example.
Since there were only two service instances to reconcile, the process is now complete. In practice, you are likely to encounter multiple variants and many more service instances, requiring you to make additional iterations. But you can follow the iterative process shown here.
In some cases a service may need to rely on the actual device configurations to compute the changeset. It is often a requirement to pull the current device configurations from the network before executing such service. Doing a full sync-from
on a number of devices is an expensive task, especially if it needs to be performed often. The alternative way in this case is using partial-sync-from
.
In cases where a multitude of service instances touch a device that is not entirely orchestrated using NSO, i.e. relying on the partial-sync-from
feature described above, and the device needs to be replaced then all services need to be re-deployed. This can be expensive depending on the number of service instances. Partial-sync-to
enables the replacement of devices in a more efficient fashion.
Partial-sync-from
and partial-sync-to
actions allow to specify certain portions of the device's configuration to be pulled or pushed from or to the network, respectively, rather than the full config. These are more efficient operations on NETCONF devices and NEDs that support the partial-show feature. NEDs that do not support the partial-show feature will fall back to pulling or pushing the whole configuration.
Even though partial-sync-from
and partial-sync-to
allows to pull or push only a part of the device's configuration, the actions are not allowed to break the consistency of configuration in CDB or on the device as defined by the YANG model. Hence, extra consideration needs to be given to dependencies inside the device model. If some configuration item A depends on configuration item B in the device's configuration, pulling only A may fail due to unsatisfied dependency on B. In this case, both A and B need to be pulled, even if the service is only interested in the value of A.
It is important to note that partial-sync-from
and partial-sync-to
clear the transaction ID of the device in NSO unless the whole configuration has been selected (e.g. /ncs:devices/ncs:device[ncs:name='ex0']/ncs:config
). This ensures NSO does not miss any changes to other parts of the device configuration but it does make the device out of sync.
sync-from
Pulling the configuration from the network needs to be initiated outside the service code. At the same time, the list of configuration subtrees required by a certain service should be maintained by the service developer. Hence it is a good practice for such a service to implement a wrapper action that invokes the generic /devices/partial-sync-from
action with the correct list of paths. The user or application that manages the service would only need to invoke the wrapper action without needing to know which parts of the configuration the service is interested in.
The snippet in the example below (Example of running partial-sync-from action via Java API) gives an example of running partial-sync-from action via Java, using router
device from examples.ncs/getting-started/developing-with-ncs/0-router-network
.
Learn service development in Java with Examples.
As using Java for service development may be somewhat more involved than Python, this section provides further examples and additional tips for setting up the development environment for Java.
The two examples, a simple VLAN service and a Layer 3 MPLS VPN service are more elaborate but show the same techniques as Implementing Services.
If you or your team primarily focuses on services implemented in Python, feel free to skip or only skim through this section.
In this example, you will create a simple VLAN service in Java. In order to illustrate the concepts, the device configuration is simplified from a networking perspective and only uses one single device type (Cisco IOS).
We will first look at the following preparatory steps:
Prepare a simulated environment of Cisco IOS devices: in this example, we start from scratch in order to illustrate the complete development process. We will not reuse any existing NSO examples.
Generate a template service skeleton package: use NSO tools to generate a Java-based service skeleton package.
Write and test the VLAN Service Model.
Analyze the VLAN service mapping to IOS configuration.
These steps are no different from defining services using templates. Next is to start playing with the Java Environment:
Configuring the start and stop of the Java VM.
First look at the Service Java Code: introduction to service mapping in Java.
Developing by tailing log files.
Developing using Eclipse.
We will start by setting up a run-time environment that includes simulated Cisco IOS devices and configuration data for NSO. Make sure you have sourced the ncsrc
file.
Create a new directory that will contain the files for this example, such as:
Now, let's create a simulated environment with 3 IOS devices and an NSO that is ready to run with this simulated network:
Start the simulator and NSO:
Use the Cisco CLI towards one of the devices:
Use the NSO CLI to get the configuration:
Finally, set VLAN information manually on a device to prepare for the mapping later.
In the run-time directory, you created:
Note the packages
directory, cd
to it:
Currently, there is only one package, the Cisco IOS NED.
We will now create a new package that will contain the VLAN service.
This creates a package with the following structure:
During the rest of this section, we will work with the vlan/src/yang/vlan.yang
and vlan/src/java/src/com/example/vlan/vlanRFS.java
files.
So, if a user wants to create a new VLAN in the network what should the parameters be? Edit the vlan/src/yang/vlan.yang
according to below:
This simple VLAN service model says:
We give a VLAN a name, for example net-1
.
The VLAN has an id from 1 to 4096.
The VLAN is attached to a list of devices and interfaces. In order to make this example as simple as possible the interface name is just a string. A more correct and useful example would specify this is a reference to an interface to the device, but for now it is better to keep the example simple.
The VLAN service list is augmented into the services tree in NSO. This specifies the path to reach VLANs in the CLI, REST, etc. There are no requirements on where the service shall be added into NCS, if you want VLANs to be at the top level, simply remove the augments statement.
Make sure you keep the lines generated by the ncs-make-package
:
The two lines tell NSO that this is a service. The first line expands to a YANG structure that is shared amongst all services. The second line connects the service to the Java callback.
To build this service model, cd
to packages/vlan/src
and type make
(assumes that you have the prerequisite make
build system installed).
We can now test the service model by requesting NSO to reload all packages:
You can also stop and start NSO, but then you have to pass the option --with-package-reload
when starting NSO. This is important, NSO does not by default take any changes in packages into account when restarting. When packages are reloaded the state/packages-in-use
is updated.
Now, create a VLAN service, (nothing will happen since we have not defined any mapping).
Now, let us move on and connect that to some device configuration using Java mapping. Note well that Java mapping is not needed, templates are more straightforward and recommended but we use this as a "Hello World" introduction to Java service programming in NSO. Also at the end, we will show how to combine Java and templates. Templates are used to define a vendor-independent way of mapping service attributes to device configuration and Java is used as a thin layer before the templates to do logic, call-outs to external systems, etc.
The default configuration of the Java VM is:
By default, NCS will start the Java VM by invoking the command $NCS_DIR/bin/ncs-start-java-vm
. That script will invoke
The class NcsJVMLauncher
contains the main()
method. The started Java VM will automatically retrieve and deploy all Java code for the packages defined in the load path of the ncs.conf
file. No other specification than the package-meta-data.xml
for each package is needed.
The verbosity of Java error messages can be controlled by:
For more details on the Java VM settings, see NSO Java VM.
The service model and the corresponding Java callback are bound by the servicepoint name. Look at the service model in packages/vlan/src/yang
:
The corresponding generated Java skeleton, (one print 'Hello World!' statement added):
Modify the generated code to include the print "Hello World!" statement in the same way. Re-build the package:
Whenever a package has changed, we need to tell NSO to reload the package. There are three ways:
Just reload the implementation of a specific package, will not load any model changes: admin@ncs# packages package vlan redeploy
.
Reload all packages including any model changes: admin@ncs# packages reload
.
Restart NSO with reload option: $ncs --with-package-reload
.
When that is done we can create a service (or modify an existing one) and the callback will be triggered:
Now, have a look at the logs/ncs-java-vm.log
:
Tailing the ncs-java-vm.log
is one way of developing. You can also start and stop the Java VM explicitly and see the trace in the shell. To do this, tell NSO not to start the VM by adding the following snippet to ncs.conf
:
Then, after restarting NSO or reloading the configuration, from the shell prompt:
So modifying or creating a VLAN service will now have the "Hello World!" string show up in the shell. You can modify the package, then reload/redeploy, and see the output.
To use a GUI-based IDE Eclipse, first generate an environment for Eclipse:
This will generate two files, .classpath
and .project
. If we add this directory to Eclipse as a File -> New -> Java Project, uncheck the Use default location and enter the directory where the .classpath
and .project
have been generated.
We are immediately ready to run this code in Eclipse.
All we need to do is choose the main()
routine in the NcsJVMLauncher
class. The Eclipse debugger works now as usual, and we can, at will, start and stop the Java code.
Timeouts
A caveat worth mentioning here is that there exist a few timeouts between NSO and the Java code that will trigger when we are in the debugger. While developing with the Eclipse debugger and breakpoints, we typically want to disable these timeouts.
First, we have the three timeouts in ncs.conf
that matter. Set the three values of /ncs-config/japi/new-session-timeout
, /ncs-config/japi/query-timeout
, and /ncs-config/japi/connect-timeout
to a large value (see man page ncs.conf(5) for a detailed description on what those values are). If these timeouts are triggered, NSO will close all sockets to the Java VM.
Edit the file and enter the following XML entry just after the Webui entry:
Now, restart ncs
, and from now on start it as:
You can verify that the Java VM is not running by checking the package status:
Create a new project and start the launcher main
in Eclipse:
You can start and stop the Java VM from Eclipse. Note well that this is not needed since the change cycle is: modify the Java code, make
in the src
directory, and then reload the package. All while NSO and the JVM are running.
Change the VLAN service and see the console output in Eclipse:
Another option is to have Eclipse connect to the running VM. Start the VM manually with the -d
option.
Then you can set up Eclipse to connect to the NSO Java VM:
In order for Eclipse to show the NSO code when debugging, add the NSO Source Jars (add external Jar in Eclipse):
Navigate to the service create
for the VLAN service and add a breakpoint:
Commit a change of a VLAN service instance and Eclipse will stop at the breakpoint:
So the problem at hand is that we have service parameters and a resulting device configuration. Previously, we showed how to do that with templates. The same principles apply in Java. The service model and the device models are YANG models in NSO irrespective of the underlying protocol. The Java mapping code transforms the service attributes to the corresponding configuration leafs in the device model.
The NAVU API lets the Java programmer navigate the service model and the device models as a DOM tree. Have a look at the create
signature:
Two NAVU nodes are passed: the actual service service
instance and the NSO root ncsRoot
.
We can have a first look at NAVU by analyzing the first try
statement:
NAVU is a lazy evaluated DOM tree that represents the instantiated YANG model. So knowing the NSO model: devices/device
, (container/list
) corresponds to the list of capabilities for a device, this can be retrieved by ncsRoot.container("devices").list("device")
.
The service
node can be used to fetch the values of the VLAN service instance:
vlan/name
vlan/vlan-id
vlan/device-if/device and vlan/device-if/interface
The first snippet that iterates the service model and prints to the console looks like below:
The com.tailf.conf
package contains Java Classes representing the YANG types like ConfUInt32
.
Try it out in the following sequence:
Rebuild the Java Code: In packages/vlan/src
type make
.
Reload the Package: In the NSO Cisco CLI, do admin@ncs# packages package vlan redeploy
.
Create or Modify a vlan
Service: In NSO CLI, do admin@ncs(config)# services vlan net-0 vlan-id 844 device-if c0 interface 1/0
, and commit.
Remember the service
attribute is passed as a parameter to the create method. As a starting point, look at the first three lines:
To reach a specific leaf in the model use the NAVU leaf method with the name of the leaf as a parameter. This leaf then has various methods like getting the value as a string.
service.leaf("vlan-id")
and service.leaf(vlan._vlan_id_)
are two ways of referring to the VLAN-id leaf of the service. The latter alternative uses symbols generated by the compilation steps. If this alternative is used, you get the benefit of compilation time checking. From this leaf you can get the value according to the type in the YANG model ConfUInt32
in this case.
Line 3 shows an example of casting between types. In this case, we prepare the VLAN ID as a 16 unsigned int for later use.
The next step is to iterate over the devices and interfaces. The NAVU elements()
returns the elements of a NAVU list.
In order to write the mapping code, make sure you have an understanding of the device model. One good way of doing that is to create a corresponding configuration on one device and then display that with the pipe target display xpath
. Below is a CLI output that shows the model paths for FastEthernet 1/0
:
Another useful tool is to render a tree view of the model:
This can then be opened in a Web browser and model paths are shown to the right:
Now, we replace the print statements with setting real configuration on the devices.
Let us walk through the above code line by line. The device-name
is a leafref
. The deref
method returns the object that the leafref
refers to. The getParent()
might surprise the reader. Look at the path for a leafref: /device/name/config/ios:interface/name
. The name
leafref is the key that identifies a specific interface. The deref
returns that key, while we want to have a reference to the interface, (/device/name/config/ios:interface
), that is the reason for the getParent()
.
The next line sets the VLAN list on the device. Note well that this follows the paths displayed earlier using the NSO CLI. The sharedCreate()
is important, it creates device configuration based on this service, and it says that other services might also create the same value, "shared". Shared create maintains reference counters for the created configuration in order for the service deletion to delete the configuration only when the last service is deleted. Finally, the interface name is used as a key to see if the interface exists, "containsNode()"
.
The last step is to update the VLAN list for each interface. The code below adds an element to the VLAN leaf-list
.
Note that the code uses the sharedCreate()
functions instead of create()
, as the shared variants are preferred and a best practice.
The above create
method is all that is needed for create, read, update, and delete. NSO will automatically handle any changes, like changing the VLAN ID, adding an interface to the VLAN service, and deleting the service. This is handled by the FASTMAP engine, it renders any change based on the single definition of the create method.
The mapping strategy using only Java is illustrated in the following figure.
This strategy has some drawbacks:
Managing different device vendors. If we would introduce more vendors in the network this would need to be handled by the Java code. Of course, this can be factored into separate classes in order to keep the general logic clean and just pass the device details to specific vendor classes, but this gets complex and will always require Java programmers to introduce new device types.
No clear separation of concerns, domain expertise. The general business logic for a service is one thing, detailed configuration knowledge of device types is something else. The latter requires network engineers and the first category is normally separated into a separate team that deals with OSS integration.
Java and templates can be combined:
In this model, the Java layer focuses on required logic, but it never touches concrete device models from various vendors. The vendor-specific details are abstracted away using feature templates. The templates take variables as input from the service logic, and the templates in turn transform these into concrete device configuration. The introduction of a new device type does not affect the Java mapping.
This approach has several benefits:
The service logic can be developed independently of device types.
New device types can be introduced at runtime without affecting service logic.
Separation of concerns: network engineers are comfortable with templates, they look like a configuration snippet. They have expertise in how configuration is applied to real devices. People defining the service logic often are more programmers, they need to interface with other systems, etc, this suites a Java layer.
Note that the logic layer does not understand the device types, the templates will dynamically apply the correct leg of the template depending on which device is touched.
From an abstraction point of view, we want a template that takes the following variables:
VLAN ID
Device and interface
So the mapping logic can just pass these variables to the feature template and it will apply it to a multi-vendor network.
Create a template as described before.
Create a concrete configuration on a device, or several devices of different type
Request NSO to display that as XML
Replace values with variables
This results in a feature template like below:
This template only maps to Cisco IOS devices (the xmlns="urn:ios"
namespace), but you can add "legs" for other device types at any point in time and reload the package.
Nodes set with a template variable evaluating to the empty string are ignored, e.g., the setting <some-tag>{$VAR}</some-tag> is ignored if the template variable $VAR evaluates to the empty string. However, this does not apply to XPath expressions evaluating to the empty string. A template variable can be surrounded by the XPath function string() if it is desirable to set a node to the empty string.
The Java mapping logic for applying the template is shown below:
Note that the Java code has no clue about the underlying device type, it just passes the feature variables to the template. At run-time, you can update the template with mapping to other device types. The Java code stays untouched, if you modify an existing VLAN service instance to refer to the new device type the commit
will generate the corresponding configuration for that device.
The smart reader will complain, "Why do we have the Java layer at all?", this could have been done as a pure template solution. That is true, but now this simple Java layer gives room for arbitrary complex service logic before applying the template.
The steps to build the solution described in this section are:
Create a run-time directory: $ mkdir ~/service-template; cd ~/service-template
.
Generate a netsim environment: $ ncs-netsim create-network $NCS_DIR/packages/neds/cisco-ios 3 c
.
Generate the NSO runtime environment: $ ncs-setup --netsim-dir ./netsim --dest ./
.
Create the VLAN package in the packages directory: $ cd packages; ncs-make-package --service-skeleton java vlan
.
Create a template directory in the VLAN package: $ cd vlan; mkdir templates
.
Save the above-described template in packages/vlan/templates
.
Create the YANG service model according to the above: packages/vlan/src/yang/vlan.yang
.
Update the Java code according to the above: packages/vlan/src/java/src/com/example/vlan/vlanRFS.java
.
Build the package: in packages/vlan/src
do make
.
Start NSO.
This service shows a more elaborate service mapping. It is based on the examples.ncs/service-provider/mpls-vpn
example.
MPLS VPNs are a type of Virtual Private Network (VPN) that achieves segmentation of network traffic using Multiprotocol Label Switching (MPLS), often found in Service Provider (SP) networks. The Layer 3 variant uses BGP to connect and distribute routes between sites of the VPN.
The figure below illustrates an example configuration for one leg of the VPN. Configuration items in bold are variables that are generated from the service inputs.
Sometimes the input parameters are enough to generate the corresponding device configurations. But in many cases, this is not enough. The service mapping logic may need to reach out to other data in order to generate the device configuration. This is common in the following scenarios:
Policies: it might make sense to define policies that can be shared between service instances. The policies, for example, QoS, have data models of their own (not service models) and the mapping code reads from that.
Topology Information: the service mapping might need to know connected devices, like which PE the CE is connected to.
Resources like VLAN IDs, and IP Addresses: these might not be given as input parameters. This can be modeled separately in NSO or fetched from an external system.
It is important to design the service model to consider the above examples: what is input? what is available from other sources? This example illustrates how to define QoS policies "on the side". A reference to an existing QoS policy is passed as input. This is a much better principle than giving all QoS parameters to every service instance. Note well that if you modify the QoS definitions that services are referring to, this will not change the existing services. In order to have the service to read the changed policies you need to perform a re-deploy on the service.
This example also uses a list that maps every CE to a PE. This list needs to be populated before any service is created. The service model only has the CE as input parameter, and the service mapping code performs a lookup in this list to get the PE. If the underlying topology changes a service re-deploy will adopt the service to the changed CE-PE links. See more on topology below.
NSO has a package to manage resources like VLAN and IP addresses as a pool within NSO. In this way the resources are managed within the transaction. The mapping code could also reach out externally to get resources. Nano services are recommended for this.
Using topology information in the instantiation of an NSO service is a common approach, but also an area with many misconceptions. Just like a service in NSO takes a black-box view of the configuration needed for that service in the network NSO treats topologies in the same way. It is of course common that you need to reference topology information in the service but it is highly desirable to have a decoupled and self-sufficient service that only uses the part of the topology that is interesting/needed for the specific service should be used.
Other parts of the topology could either be handled by other services or just let the network state sort it out - it does not necessarily relate to the configuration of the network. A routing protocol will for example handle the IP path through the network.
It is highly desirable to not introduce unneeded dependencies towards network topologies in your service.
To illustrate this, let's look at a Layer 3 MPLS VPN service. A logical overview of an MPLS VPN with three endpoints could look something like this. CE routers connecting to PE routers, that are connected to an MPLS core network. In the MPLS core network, there are a number of P routers.
In the service model, you only want to configure the CE devices to use as endpoints. In this case, topology information could be used to sort out what PE router each CE router is connected to. However, what type of topology do you need? Lets look at a more detailed picture of what the L1 and L2 topology could look like for one side of the picture above.
In pretty much all networks there is an access network between the CE and PE router. In the picture above the CE routers are connected to local Ethernet switches connected to a local Ethernet access network, connected through optical equipment. The local Ethernet access network is connected to a regional Ethernet access network, connected to the PE router. Most likely the physical connections between the devices in this picture have been simplified, in the real world redundant cabling would be used. The example above is of course only one example of how an access network could look like and it is very likely that a service provider have different access technologies. For example Ethernet, ATM, or a DSL-based access network.
Depending on how you design the L3VPN service, the physical cabling or the exact traffic path taken in the layer 2 Ethernet access network might not be that interesting, just like we don't make any assumptions or care about how traffic is transported over the MPLS core network. In both these cases we trust the underlying protocols handling state in the network, spanning tree in the Ethernet access network, and routing protocols like BGP in the MPLS cloud. Instead in this case, it could make more sense to have a separate NSO service for the access network, both so it can be reused for both for example L3VPNs and L2VPN but also to not tightly couple to the access network with the L3VPN service since it can be different (Ethernet or ATM etc.).
Looking at the topology again from the L3VPN service perspective, if services assume that the access network is already provisioned or taken care of by another service, it could look like this.
The information needed to sort out what PE router a CE router is connected to as well as configuring both CE and PE routers is:
Interface on the CE router that is connected to the PE router, and IP address of that interface.
Interface on the PE router that is connected to the CE router, and IP address to the interface.
This section describes the creation of an MPLS L3VPN service in a multi-vendor environment by applying the concepts described above. The example discussed can be found in examples.ncs/service-provider/mpls-vpn
. The example network consists of Cisco ASR 9k and Juniper core routers (P and PE) and Cisco IOS-based CE routers.
The goal of the NSO service is to set up an MPLS Layer3 VPN on a number of CE router endpoints using BGP as the CE-PE routing protocol. Connectivity between the CE and PE routers is done through a Layer2 Ethernet access network, which is out of the scope of this service. In a real-world scenario, the access network could for example be handled by another service.
In the example network, we can also assume that the MPLS core network already exists and is configured.
When designing service YANG models there are a number of things to take into consideration. The process usually involves the following steps:
Identify the resulting device configurations for a deployed service instance.
Identify what parameters from the device configurations are common and should be put in the service model.
Ensure that the scope of the service and the structure of the model work with the NSO architecture and service mapping concepts. For example, avoid unnecessary complexities in the code to work with the service parameters.
Ensure that the model is structured in a way so that integration with other systems north of NSO works well. For example, ensure that the parameters in the service model map to the needed parameters from an ordering system.
Steps 1 and 2: Device Configurations and Identifying Parameters:
Deploying an MPLS VPN in the network results in the following basic CE and PE configurations. The snippets below only include the Cisco IOS and Cisco IOS-XR configurations. In a real process, all applicable device vendor configurations should be analyzed.
The device configuration parameters that need to be uniquely configured for each VPN have been marked in bold.
Steps 3 and 4: Model Structure and Integration with other Systems:
When configuring a new MPLS l3vpn in the network we will have to configure all CE routers that should be interconnected by the VPN, as well as the PE routers they connect to.
However, when creating a new l3vpn service instance in NSO it would be ideal if only the endpoints (CE routers) are needed as parameters to avoid having knowledge about PE routers in a northbound order management system. This means a way to use topology information is needed to derive or compute what PE router a CE router is connected to. This makes the input parameters for a new service instance very simple. It also makes the entire service very flexible, since we can move CE and PE routers around, without modifying the service configuration.
Resulting YANG Service Model:
The snipped above contains the l3vpn service model. The structure of the model is very simple. Every VPN has a name, an as-number, and a list of all the endpoints in the VPN. Each endpoint has:
A unique ID.
A reference to a device (a CE router in our case).
A pointer to the LAN local interface on the CE router. This is kept as a string since we want this to work in a multi-vendor environment.
LAN private IP network.
Bandwidth on the VPN connection.
To be able to derive the CE to PE connections we use a very simple topology model. Notice that this YANG snippet does not contain any service point, which means that this is not a service model but rather just a YANG schema letting us store information in CDB.
The model basically contains a list of connections, where each connection points out the device, interface, and IP address in each of the connections.
Since we need to look up which PE routers to configure using the topology model in the mapping logic it is not possible to use a declarative configuration template-based mapping. Using Java and configuration templates together is the right approach.
The Java logic lets you set a list of parameters that can be consumed by the configuration templates. One huge benefit of this approach is that all the parameters set in the Java code are completely vendor-agnostic. When writing the code, there is no need for knowledge of what kind of devices or vendors exist in the network, thus creating an abstraction of vendor-specific configuration. This also means that in to create the configuration template there is no need to have knowledge of the service logic in the Java code. The configuration template can instead be created and maintained by subject matter experts, the network engineers.
With this service mapping approach, it makes sense to modularize the service mapping by creating configuration templates on a per-feature level, creating an abstraction for a feature in the network. In this example means, we will create the following templates:
CE router
PE router
This is both to make services easier to maintain and create but also to create components that are reusable from different services. This can of course be even more detailed with templates with for example BGP or interface configuration if needed.
Since the configuration templates are decoupled from the service logic it is also possible to create and add additional templates in a running NSO system. You can for example add a CE router from a new vendor to the layer3 VPN service by only creating a new configuration template, using the set of parameters from the service logic, to a running NSO system without changing anything in the other logical layers.
The Java code part for the service mapping is very simple and follows the following pseudo code steps:
This section will go through relevant parts of Java outlined by the pseudo-code above. The code starts with defining the configuration templates and reading the list of endpoints configured and the topology. The Navu API is used for navigating the data models.
The next step is iterating over the VPN endpoints configured in the service, finding out connected PE router using small helper methods navigating the configured topology.
The parameter dictionary is created from the TemplateVariables class and is populated with appropriate parameters.
The last step after all parameters have been set is applying the templates for the CE and PE routers for this VPN endpoint.
The configuration templates are XML templates based on the structure of device YANG models. There is a very easy way to create the configuration templates for the service mapping if NSO is connected to a device with the appropriate configuration on it, using the following steps.
Configure the device with the appropriate configuration.
Add the device to NSO
Sync the configuration to NSO.
Display the device configuration in XML format.
Save the XML output to a configuration template file and replace configured values with parameters
The commands in NSO give the following output. To make the example simpler, only the BGP part of the configuration is used:
The final configuration template with the replaced parameters marked in bold is shown below. If the parameter starts with a $
-sign, it's taken from the Java parameter dictionary; otherwise, it is a direct xpath reference to the value from the service instance.
Gather useful information for debugging and troubleshooting.
Progress tracing in NSO provides developers with useful information for debugging, diagnostics, and profiling. This information can be used both during development cycles and after the release of the software. The system overhead for progress tracing is usually negligible.
When a transaction or action is applied, NSO emits progress events. These events can be displayed and recorded in a number of different ways. The easiest way is to pipe an action to details in the CLI.
As seen by the details output, all events are recorded with a timestamp and in some cases with the duration. All phases of the transaction, service, and device communication are printed.
Some actions (usually those involving device communication) also produce progress data.
The pipe details in the CLI are useful during development cycles of for example a service, but not as useful when tracing calls from other northbound interfaces or events in a released running system. Then it's better to configure a progress trace to be outputted to a file or operational data which can be retrieved through a northbound interface.
The top-level container progress
is by default invisible due to a hidden attribute. To make progress
visible in the CLI, two steps are required:
First, the following XML snippet must be added to ncs.conf
:\
Then, the unhide
command is used in the CLI session:
Progress data can be outputted to a given file. This is useful when the data is to be analyzed in some third-party software like a spreadsheet application.
The file can be formatted as a comma-separated values file defined by RFC 4180 or in a pretty printed log file with each event on a single line.
The location of the file is the directory of /ncs-config/logs/progress-trace/dir
in ncs.conf
.
When the data is to be retrieved through a northbound interface, it is more useful to output the progress events as operational data.
This will log non-persistent operational data to the /progress:progress/trace/event
list. As this list might grow rapidly there is a maximum size of it (defaults to 1000 entries). When the maximum size is reached, the oldest list entry is purged.
Using the /progress:progress/trace/purge
action the event list can be purged.
Progress events can be subscribed to as Notifications events. See NOTIF API for further details.
The verbosity
parameter is used to control the level of output. The following levels are available:
Additional debug tracing can be turned on for various parts. These are consciously left out of the normal debug level due to the high amount of output and should only be turned on during development.
By default, all transaction and action events with the given verbosity level will be logged. To get a more selective choice of events, filters can be used.
The context filter can be used to only log events that originate through a specific northbound interface. The context is either one of netconf
, cli
, webui
, snmp
, rest
, system
or it can be any other context string defined through the use of MAAPI.
API methods to report progress events exist for Java, Python, and C. There also exist specific methods to report progress events for services.
Optimize NSO for scaling and performance.
With an increasing number of services and managed devices in NSO, performance becomes a more important aspect of the system. At the same time, other aspects, such as the way you organize code, also start playing an important role when using NSO on a bigger scale.
The following section examines these concerns and presents the available options for scaling your NSO automation solution.
NSO allows you to tackle different automation challenges and every solution has its own specifics. Therefore, the best approach to scaling depends on the way the solution is implemented. What works in one case may be useless, or effectively degrade performance, for another. You must first analyze and understand how your particular use case behaves, which will then allow you to take the right approach to scaling.
When trying to improve the performance, a very good, possibly even the best starting point is to inspect the tracing data. Tracing is further described in Progress Trace. Yet a simple commit | details
command already provides a lot of useful data.
Pay attention to the time NSO spends doing specific tasks. For a simple service, these are mainly:
Validate service data (pre-transform validation)
Run service mapping logic
Validate produced configuration (changeset)
Push changes to affected devices
Commit the new configuration
Tracing data can often quickly reveal a bottleneck, a hidden delay, or some other unexpected inefficiency in your code. The best strategy is to first address any such concerns if they show up since only well-performing code is a good candidate for further optimization. Otherwise, you might find yourself optimizing the wrong parameters and hitting a dead end. Visualizing the progress trace is often helpful in identifying bottlenecks. See Measuring Transaction Throughput.
Analyzing the service in isolation can yield useful insight. But it may also lead you in the wrong direction because some issues only manifest under load and the data from a live system can surprise you. That is why NSO supports different ways of exposing tracing information, including operational data and notification events. Remember to always verify that your observations and assumptions hold for a live, production system, too.
The times for different parts of the transaction, as reported by the tracing data, are very useful in determining where to focus your efforts.
For example, if your service data model uses a very broad must
or similar XPath statement, then NSO may potentially need to evaluate thousands of data entries. Such evaluation requires a considerable amount of additional processing and is, in turn, reflected in increased time spent in validation. The solution in this case is to limit the scope of the data referenced in the YANG constraint, which you can often achieve with a more specific XPath expression.
Similarly, if a significant amount of time is spent constructing a service mapping, perhaps there is some redundant work occurring that you could optimize? Sometimes, however, provisioning requires calls to other systems or some computationally expensive operation, which you cannot easily manage without. Then you might want to consider splitting the provisioning process into smaller pieces, using nano services, for example. See Simplify the Per-Device Concurrent Transaction Creation Using a Nano Service for an example use-case and references to the Nano service documentation.
In general, your own code for a single transaction with no additional load on NSO should execute quickly (sub-second, as a rule of thumb). The faster each service or action code is, the better the overall system performance. Using a service design pattern to both improve performance and scale and avoid conflicts is described in Design to Minimize Conflicts.
Things such as reading external data or large computations should not be done inside the create code. Consider using an action to encapsulate these functions. An action does not run under the lock unless it triggers a transaction and can perform side effects as desired.
There are several ways to utilize an action:
An action is allowed to perform side effects.
An action can read operational data from devices or external systems.
An action can write values to operational data in CDB, for later use from the service.
An action can write configuration to CDB, potentially triggering a service.
Actions can be used together with nano services, see Simplify the Per-Device Concurrent Transaction Creation Using a Nano Service.
With the default configuration, one of the first things you might notice standing out in the tracing data is that pushing device configuration takes a significant amount of time compared to other parts of service provisioning. Why is that?
All changes in NSO happen inside a transaction. Network devices participate in the transaction, which gives you the all-or-nothing behavior, to ensure correctness and consistency across the network. But network communication is not instantaneous and a transaction in NSO holds a lock while waiting for devices to process the change. This way, changes to network devices are serialized, even when there are multiple simultaneous transactions. However, a lock blocks other transactions from proceeding, ultimately limiting the overall NSO transaction rate.
So, in many cases, the NSO system is not really resource-constrained but merely experiencing lock contention. Therefore, making locks as short as possible is the best way to improve performance. In the example trace from the section Understanding Your Use Case, most of the time is spent in the prepare phase, where configuration changes are propagated to the network devices. Change propagation requires a management session with each participating device, as well as updating and validating the new configuration on the device side. Understandably, all of these tasks take time.
NSO allows you to influence this behavior. Take a look at Commit Queue on how to avoid long device locks with commit queues and the trade-offs they bring. Usually, enabling the commit queue feature is the first and the most effective step to significantly improving transaction times.
The CDB subscriber mechanism is used to notify the application code about CDB changes and runs at the end of the transaction commit, inside a global lock. Due to this fact, the number and configuration of subscribers affect performance and should be investigated early in your performance optimization efforts.
A badly implemented subscriber prolongs the time the transaction holds the lock, preventing other transactions from completing, in addition to the original transaction taking more time to commit. There are mainly two reasons for suboptimal operation: either the subscriber is too broad and must process too many (irrelevant) changes, or it performs more work inside the lock as necessary. As a recommended practice, the subscriber should only note the changes and schedule the processing to be done later, in order to return and release the lock as quickly as possible.
Moreover, subscribers incur processing overhead regardless of their implementation because NSO needs to communicate with the custom subscriber code, typically written in Java or Python.
That is why modern, performant code in NSO should use the kicker mechanism instead of implementing custom subscribers. While it is still possible to create a badly performing kicker, you are less likely to do so inadvertently. In most situations, kickers are also easier to implement and troubleshoot. You can read more on kickers in Kicker.
The time it takes to complete a transaction is certainly an important performance metric. However, after a certain point, it gets increasingly hard or even impossible to get meaningful improvement from optimizing each individual transaction. As it turns out, on a busy system, there are usually multiple outstanding requests. So, instead of trying to process each as fast as possible one after another, the system might process them in parallel.
In practice and as the figure shows, some parts must still be processed sequentially to ensure transactional properties. However, there is a significant gain in the overall time it takes to process all transactions in a busy system, even though each might take a little longer individually due to the concurrency overhead.
Throughput then becomes a more relevant metric. It is the number of requests or transactions that the system can process in a given time unit. While throughput is still related to individual transaction times, other factors also come into play. An important one is the way in which NSO implements concurrency and the interaction between the transaction system and your, user, code. Designing for transaction throughput is covered in detail later in this section, and the NSO concurrency model is detailed in NSO Concurrency Model.
The section provides guidance on identifying transaction conflicts and what affects their occurrence, so you can make your code more resistant to producing them. Conflicts arise more frequently on busier systems and negatively affect throughput, which makes them a good candidate for optimization.
Depending on the specifics of the server running NSO, additional performance improvement might be possible by fine-tuning the transaction-limits
set of configuration parameters in ncs.conf
. Please see the ncs.conf(1) manpage for details.
If you are experiencing high resource utilization, such as memory and CPU usage, while individual transactions are optimized to execute fast and the rate of conflicts is low, it's possible you are starting to see the level of demand that pushes the limits of this system.
First, you should try adding more resources, in a scale-up manner, if possible. At the same time, you might also have some services that are using an older, less performant user code execution model. For example, the way Python code is executed is controlled by the callpoint-model option, described in The application
Component, which you should ensure is set to the most performant setting.
Regardless, a single system cannot scale indefinitely. After you have exhausted all other options, you will need to “scale out,” that is, split the workload across multiple NSO instances. You can achieve this by using the Layered Service Architecture (LSA) approach. But the approach has its trade-offs, so make sure it provides the right benefits in your case. The LSA is further documented in LSA Overview in Layered Service Architecture.
sync-from
In a brownfield environment, where the configuration is not 100% automated and controlled by NSO alone but also written to by other systems or operators, NSO is bound to end up out-of-sync with the device. How to handle synchronization is a big topic, and it is vital to understand what it means to you when things are out of sync. This will help guide your strategy.
If NSO is frequently brought out of sync, it can be tempting to invoke sync-from
from the create callback. While it does achieve a higher degree of reliability in the sense that service modifications won't return an out-of-sync error, the impact on performance is usually catastrophic. The typical sync-from
operation takes orders of magnitudes longer than the typical service modification, and transactional throughput will suffer greatly.
But other alternatives are often better:
You can synchronize the configuration from the device when it reports a change rather than when the service is modified by listening for configuration change events from the device, e.g., via RESTCONF or NETCONF notifications, SNMP traps, or Syslog, and invoking sync-from
or partial-sync-from
when another party (not NSO) has modified the device. See also the section called Partial Sync.
Using the devices sync-from
command does not hold the transaction lock and run across devices concurrently, which reduces the total amount of time spent time synchronizing. This is particularly useful for periodic synchronization to lower the risk of being out-of-sync when committing configuration changes.
Using the no-overwrite
commit flag, you can be more lax about being in sync and focus on not overwriting the modified configuration.
If the configuration is 100% automated and controlled by NSO alone, using out-of-sync-behaviour accept
, you can completely ignore if the device is in sync or not.
Letting your modification fail with an out-of-sync error and handling that error at the calling side.
Maximal transaction throughput refers to the maximum number of transactions a system can handle within a given period. Factors that can influence maximal transaction throughput include:
Hardware capabilities (e.g., processing power, memory).
Software efficiency.
Network bandwidth.
The complexity of the transactions themselves.
Besides making sure the system hardware capabilities and network bandwidth are not a bottleneck, there are four areas where the NSO user can significantly affect the transaction throughput performance for an NSO node:
Run multiple transactions concurrently. For example, multiple concurrent RESTCONF or NETCONF edits, CLI commits, MAAPI apply()
, nano service re-deploy, etc.
Design to avoid conflicts and minimize the service create()
and validation implementation. For example, in service templates and code mapping to devices or other service instances, YANG must
statements with XPath expressions or validation code.
Using commit queues to exclude the time to push configuration changes to devices from inside the transaction lock.
Simplify using nano and stacked services. If the processor where NSO with a stacked service runs becomes a severe bottleneck, the added complexity of migrating the stacked service to an LSA setup can be motivated. LSA helps expose only a single service instance when scaling up the number of devices by increasing the number of available CPU cores beyond a single processor.
Measuring transaction performance includes measuring the total wall-clock time for the service deployment transaction(s) and using the detailed NSO progress trace of the transactions to find bottlenecks. The developer log helps debug the NSO internals, and the XPath trace log helps find misbehaving XPath expressions used in, for example, YANG must
statements.
The picture below shows a visualization of the NSO progress trace when running a single transaction for two service instances configuring a device each:
The total RESTCONF edit took ~5 seconds, and the service mapping (“creating service” event) and validation (“run validation ...” event) were done sequentially for the service instances and took 2 seconds each. The configuration push to the devices was done concurrently in 1 second.
For progress trace documentation, see Progress Trace.
perf-trans
Example Using a Single TransactionThe perf-trans
example from the NSO example set explores the opportunities to improve the wall-clock time performance and utilization, as well as opportunities to avoid common pitfalls.
The example uses simulated CPU loads for service creation and validation work. Device work is simulated with sleep()
as it will not run on the same processor in a production system.
The example shows how NSO can benefit from running many transactions concurrently if the service and validation code allow concurrency. It uses the NSO progress trace feature to get detailed timing information for the transactions in the system.
The provided code sets up an NSO instance that exports tracing data to a .csv
file, provisions one or more service instances, which each map to a device, and shows different (average) transaction times and a graph to visualize the sequences plus concurrency.
Play with the perf-trans
example by tweaking the measure.py
script parameters:
See the README in the perf-trans
example for details.
To run the perf-trans
example from the NSO example set and recreate the variant shown in the progress trace above:
The following is a sequence diagram and the progress trace of the example, describing the transaction t1
. The transaction deploys service configuration to the devices using a single RESTCONF patch
request to NSO and then NSO configures the netsim devices using NETCONF:
The only part running concurrently in the example above was configuring the devices. It is the most straightforward option if transaction throughput performance is not a concern or the service creation and validation work are insignificant. A single transaction service deployment will not need to use commit queues as it is the only transaction holding the transaction lock configuring the devices inside the critical section. See the “holding transaction lock” event in the progress trace above.
Stop NSO and the netsim devices:
Everything from smartphones and tablets to laptops, desktops, and servers now contain multi-core processors. For maximal throughput, the powerful multi-core systems need to be fully utilized. This way, the wall clock time is minimized when deploying service configuration changes to the network, which is usually equated with performance. Therefore, enabling NSO to spread as much work as possible across all available cores becomes important. The goal is to have service deployments maximize their utilization of the total available CPU time to deploy services faster to the users who ordered them.
Close to full utilization of every CPU core when running under maximal load, for example, ten transactions to ten devices, is ideal, as some process viewer tools such as htop
visualize with meters:
One transaction per RFS instance and device will allow each NSO transaction to run on a separate core concurrently. Multiple concurrent RESTCONF or NETCONF edits, CLI commits, MAAPI apply()
, nano service re-deploy, etc. Keep the number of running concurrent transactions equal to or below the number of cores available in the multi-core processor to avoid performance degradation due to increased contention on system internals and resources. NSO helps by limiting the number of transactions applying changes in parallel to, by default, the number of logical processors (e.g., CPU cores). See ncs.conf(5) in Manual Pages under /ncs-config/transaction-limits/max-transactions
for details.
Conflicts between transactions and how to avoid them are described in Minimizing Concurrency Conflicts and in detail by the NSO Concurrency Model. While NSO can handle transaction conflicts gracefully with retries, retries affect transaction throughput performance. A simple but effective design pattern to avoid conflicts is to update one device with one Resource Facing Service (RFS) instance where service instances do not read each other's configuration changes.
An overly complex service or validation implementation using templates, code, and XPath expressions increases the processing required and, even if transactions are processed concurrently, will affect the wall-clock time spent processing and, thus, transaction throughput.
When data processing performance is of interest, the best practice rule of thumb is to ensure that must
and when
statement XPath expressions in YANG models and service templates are only used as necessary and kept as simple as possible.
If a service creates a significant amount of configuration data for devices, it is often significantly faster using a single MAAPI shared_set_values()
call instead of using multiple create()
and set()
calls or a service template.
perf-setvals
Example Using a Single Call to MAAPI shared_set_values()
The perf-setvals
example writes configuration to an access control list and a route list of a Cisco Adaptive Security Appliance (ASA) device. It uses either MAAPI Python create()
and set()
calls, Python shared_set_values()
, or Java sharedSetValues()
to write the configuration in XML format.
To run the perf-setvals
example using MAAPI Python create()
and set()
calls to create 3000 rules and 3000 routes on one device:
The commit uses the no-networking
parameter to skip pushing the configuration to the simulated and un-proportionally slow Cisco ASA netsim device. The resulting NSO progress trace:
Next, run the perf-setvals
example using a single MAAPI Python shared_set_values()
call to create 3000 rules and 3000 routes on one device:
The resulting NSO progress trace:
Using the MAAPI shared_set_values()
function, the service create
callback is, for this example, ~5x faster than using the MAAPI create()
and set()
functions. The total wall-clock time for the transaction is more than 2x faster, and the difference will increase for larger transactions.
Stop NSO and the netsim devices:
A kicker triggering on a CDB change, a data-kicker, should be used instead of a CDB subscriber when the action taken does not have to run inside the transaction lock, i.e., the critical section of the transaction. A CDB subscriber will be invoked inside the critical section and, thus, will have a negative impact on the transaction throughput. See Improving Subscribers for more details.
Writing to devices and other network elements that are slow to configure will stall transaction throughput if you do not enable commit queues, as transactions waiting for the transaction lock to be released cannot start configuring devices before the transaction ahead of them is done writing. For example, if one device is configured using CLI transported with IP over Avian Carriers, the transactions, including such a device, will significantly stall transactions behind it going to devices supporting RESTCONF or NETCONF over a fast optical transport. Where transaction throughput performance is a concern, choosing devices that can be configured efficiently to implement their part of the service configuration is wise.
perf-trans
Example Using One Transaction per DeviceDividing the service creation and validation work into two separate transactions, one per device, allows the work to be spread across two CPU cores in a multi-core processor. To run the perf-trans
example with the work divided into one transaction per device:
The resulting NSO progress trace:
A sequence diagram with transactions t1
and t2
deploying service configuration to two devices using RESTCONF patch
requests to NSO with NSO configuring the netsim devices using NETCONF:
Note how the service creation and validation work now is divided into 1s per transaction and runs concurrently on one CPU core each. However, the two transactions cannot push the configuration concurrently to a device each as the config push is done inside the critical section, making one of the transactions wait for the other to release the transaction lock. See the two “holding the transaction lock” events in the above progress trace visualization.
To enable transactions to push configuration to devices concurrently, we must enable commit queues.
The concept of a network-wide transaction requires NSO to wait for the managed devices to process the configuration change before exiting the critical section, i.e., before NSO can release the transaction lock. In the meantime, other transactions have to wait their turn to write to CDB and the devices. The commit queue feature avoids waiting for configuration to be written to the device and increases the throughput. For most use cases, commit queues improve transaction throughput significantly.
Writing to a commit queue instead of the device moves the device configuration push outside of the critical region, and the transaction lock can instead be released when the change has been written to the commit queue.
For commit queue documentation, see Commit Queue.
Enabling commit queues allows the two transactions to spread the create, validation, and configuration push to devices work across CPU cores in a multi-core processor. Only the CDB write and commit queue write now remain inside the critical section, and the transaction lock is released as soon as the device configuration changes have been written to the commit queues instead of waiting for the config push to the devices to complete. To run the perf-trans
example with the work divided into one transaction per device and commit queues enabled:
The resulting NSO progress trace:
A sequence diagram with transactions t1
and t2
deploying service configuration to two devices using RESTCONF patch
requests to NSO with NSO configuring the netsim devices using NETCONF:
Note how the two transactions now push the configuration concurrently to a device each as the config push is done outside of the critical section. See the two push configuration events in the above progress trace visualization.
Stop NSO and the netsim devices:
Running the perf-setvals
example with two devices and commit queues enabled will produce a similar result.
The perf-trans
example service uses one transaction per service instance where each service instance configures one device. This enables transactions to run concurrently on separate CPU cores in a multi-core processor. The example sends RESTCONF patch
requests concurrently to start transactions that run concurrently with the NSO transaction manager. However, dividing the work into multiple processes may not be practical for some applications using the NSO northbound interfaces, e.g., CLI or RESTCONF. Also, it makes a future migration to LSA more complex.
To simplify the NSO manager application, a resource-facing nano service (RFS) can start a process per service instance. The NSO manager application or user can then use a single transaction, e.g., CLI or RESTCONF, to configure multiple service instances where the NSO nano service divides the service instances into transactions running concurrently in separate processes.
The nano service can be straightforward, for example, using a single t3:configured
state to invoke a service template or a create()
callback. If validation code is required, it can run in a nano service post-action, t3:validated
state, instead of a validation point callback to keep the validation code in the process created by the nano service.
See Nano Services for Staged Provisioning and Develop and Deploy a Nano Service for Nano service documentation.
A Customer Facing Service (CFS) that is stacked with the RFS and maps to one RFS instance per device can simplify the service that is exposed to the NSO northbound interfaces so that a single NSO northbound interface transaction spawns multiple transactions, for example, one transaction per RFS instance when using the converge-on-re-deploy
YANG extension with the nano service behavior tree.
perf-stack
ExampleThe perf-stack
example showcases how a CFS on top of a simple resource-facing nano service can be implemented with the perf-trans
example by modifying the existing t3 RFS and adding a CFS. Instead of multiple RESTCONF transactions, the example uses a single CLI CFS service commit that updates the desired number of service instances. The commit configures multiple service instances in a single transaction where the nano service runs each service instance in a separate process to allow multiple cores to be used concurrently.
Run as below to start two transactions with a 1-second CPU time workload per transaction in both the service and validation callbacks, each transaction pushing the device configuration to one device, each using a synchronous commit queue, where each device simulates taking 1 second to make the configuration changes to the device:
The above progress trace visualization is truncated to fit, but notice how the t3:validated
state action callbacks, t3:configured
state service creation callbacks and configuration push from the commit queues are running concurrently (on separate CPU cores) when initiating the service deployment with a single transaction started by the CLI commit.
A sequence diagram describing the transaction t1
deploying service configuration to the devices using the NSO CLI:
The two transactions run concurrently, deploying the service in ~3 seconds (plus some overhead) of wall-clock time. Like the perf-trans
example, you can play around with the perf-stack
example by tweaking the parameters.
See the README
in the perf-stack
example for details. For even more details, see the steps in the showcase
script.
Stop NSO and the netsim devices:
If the processor where NSO runs becomes a severe bottleneck, the CFS can migrate to a layered service architecture (LSA) setup. The perf-stack
example implements stacked services, a CFS abstracting the RFS. It allows for easy migration to an LSA setup to scale with the number of devices or network elements participating in the service deployment. While adding complexity, LSA allows exposing a single CFS instance for all processors instead of one per processor.
Before considering taking on the complexity of a multi-NSO node LSA setup, make sure you have done the following:
Explored all possible avenues of design and optimization improvements described so far in this section.
Measured the transaction performance to find bottlenecks.
Optimized any bottlenecks to reduce their overhead as much as possible.
Observe that the available processor cores are all fully utilized.
Explored running NSO on a more powerful processor with more CPU cores and faster clock speed.
If there are more devices and RFS instances created at one point than available CPU cores, verify that increasing the number of CPU cores will result in a significant improvement. I.e., if the CPU processing spent on service creation and validation is substantial, the bottleneck, compared to writing the configuration to CDB and the commit queues and pushing the configuration to the devices.
Migrating to an LSA setup should only be considered after checking all boxes for the above items.
perf-lsa
ExampleThe perf-lsa
example builds on the perf-stack
example and showcases an LSA setup using two RFS NSO instances, lower-nso-1
and lower-nso-2
, with a CFS NSO instance, upper-nso
.
You can imagine adding more RFS NSO instances, lower-nso-3
, lower-nso-4
, etc., to the existing two as the number of devices increases. One NSO instance per multi-core processor and at least one CPU core per device (network element) is likely the most performant setup for this simulated work example. See LSA Overview in Layered Service Architecture for more.
As an example, a variant that starts four RFS transactions with a 1-second CPU time workload per transaction in both the service and validation callbacks, each RFS transaction pushing the device configuration to 1 device using synchronous commit queues, where each device simulates taking 1 second to make the configuration changes to the device:
The three NSO progress trace visualizations show NSO on the CFS and the two RFS nodes. Notice how the CLI commit starts a transaction on the CFS node and configures four service instances with two transactions on each RFS node to push the resulting configuration to four devices.
A sequence diagram describing the transactions on RFS 1 t1
t2
and RFS 2 t1
t2
. The transactions deploy service configuration to the devices using the NSO CLI:
The four transactions run concurrently, two per RFS node, performing the work and configuring the four devices in ~3 seconds (plus some overhead) of wall-clock time.
You can play with the perf-lsa
example by tweaking the parameters.
See the README
in the perf-lsa
example for details. For even more details, see the steps in the showcase
script.
Stop NSO and the netsim devices:
NSO contains an internal database called CDB, which stores both configuration and operational state data. Understanding the resource consumption of NSO at a steady state is mostly about understanding CDB, as it typically stands for the vast majority of resource usage.
Optimized for fast access, CDB is an in-memory database that holds all data in RAM. It also keeps the data on disk for persistence. The in-memory data structure is optimized for navigating tree data but is still a compact and efficient memory structure. The on-disk format uses a log structure, making it fast to write and very compact.
The in-memory structure usually consumes 2 - 3x more than the size of the on-disk format. The on-disk log will grow as more changes are performed in the system. A periodic compaction process compacts the write log and reduces its size. Upon startup of NSO, the on-disk version of CDB will be read, and the in-memory structure will be recreated based on the log. A recently compacted CDB will thus start up faster.
By default, NSO automatically determines when to compact CDB. It is visible in the devel.log
when CDB compaction takes place. Compaction may require significant time, during which write transactions cannot be performed. In certain use cases, it may be preferable to disable automatic compaction by CDB and instead trigger compaction manually according to the specific needs. See Compaction for more details.
CDB is a YANG-modeled database. By writing a YANG model, it is possible to store any kind of data in NSO and access it via one of the northbound interfaces of NSO. From this perspective, a service or a device's configuration is like most other YANG-modeled data. The number of service instances in NSO in the steady state affects how much space the data consumes in RAM and on disk.
But keep in mind that services tend to be modified from time to time, and with a higher total number of service instances, changes to those services are more likely. A higher number of service instances means more transactions to deploy changes, which means an increased need for optimizing transactional throughput, available CPU processing, RAM, and disk. See Designing for Maximal Transaction Throughput for details.
In addition to storing instance data, CDB also stores the schema (the YANG models), on disk and reads it into memory on startup. Having a large schema (many or large YANG models) loaded means both disk and RAM will be used, even when starting up an "empty" NSO, i.e., no instance data is stored in CDB.
In particular, device YANG models can be of considerable size. For example, the YANG models in recent versions of Cisco IOS XR have over 750,000 lines. Loading one such NED will consume about 1GB GB of RAM and slightly less disk space. In a mixed vendor network, you would load NEDs for all or some of these device types. With CDM, you can have multiple XR NEDs loaded to support communicating with different versions of XR and similarly for other devices, further consuming resources.
In comparison, most CLI NEDs only model a subset of a device and, are as a result, much smaller, most often under 100,000 lines of YANG.
For small NSO systems, the schema will usually consume more resources than the instance data, and NEDs, in particular, are the most significant contributors to resource consumption. As the system grows and more service and device configurations are added, the percentage of the total resource usage used for NED YANG models will decrease.
Note that the schema is memory mapped into shared memory, so even though multiple Python VMs might be started, memory usage will not increase as it shares memory between different clients. The Java VM uses its own copy of the schema, which is also why we can see that the JVM memory consumption follows the size of the loaded YANG schema.
Accurately predicting the size of CDB means accurately modeling its internal data structure. Since the result will depend on the YANG models and what actual values are stored in the database, the easiest way to understand of how the size grows is to start NSO with the schema and data in question and then measure the resource usage.
Performing accurate measurements can be a tedious process or sometimes impossible. When impossible, an estimate can be reached by extrapolating from known data, which is usually much more manageable and accurate enough.
We can look at the disk and RAM used for the running datastore, which stores configuration. On a freshly started NSO, it doesn't occupy much space at all:
Adding a device with a small configuration, in this case, a Cisco NXOS switch with about 700 lines of CLI configuration, there is a clear increase:
Compared to the size of CDB before we added the device, we can deduce that the device with its configuration takes up ~214 kB in RAM and 25 kB on disk. Adding 1000 such devices, we see how CDB resource consumption increases linearly with more devices. This graph shows the RAM and memory usage of the running datastore in CDB over time. We perform a sequential sync-from
operation on the 1000 devices, and while it is executing, we see how resource consumption increases. At the end, resource consumption has reached about 150 MB of RAM and 25 MB of disk, equating to ~150 KiB of RAM and ~25 KiB of disk per device.
The wildcard expansion in the request devices device * sync-from
is processed by the CLI, which will iterate over the devices sequentially. This is inefficient and can be sped up by using devices sync-from
which instead processes the devices concurrently. The sequential mode better produces a graph that better illustrates how this scales, which is why it is used here.
A device with a larger configuration will consume more space. With a single Juniper MX device that has a configuration with close to half a million lines of configuration, there's a substantial increase:
Similarly, adding more such devices allows monitoring of how it scales linearly. In the end, with 100 devices, CDB consumes 3.35 GB of RAM and 450 MB of disk, or ~33.5 MiB of RAM and ~4.5 MiB disk space per device.
Thus, you must do more than dimension your NSO installation based on the number of devices. You must also understand roughly how much resources each device will consume.
Unless a device uses NETCONF, NSO will not store the configuration as retrieved from the device. When configuration is retrieved, it is parsed by the NED into a structured format.
For example, here is a basic BGP stanza from a Cisco IOS device:
After being parsed by the IOS CLI NED, the equivalent configuration looks like this in NSO:
A single line, such as redistribute connected metric 123 route-map IPV4-REDISTRIBUTE-CONNECTED-TO-BGP
, is parsed into a structure of multiple nodes / YANG leaves. There is no exact correlation between the number of lines of configuration with the space it consumes in NSO. The easiest way to determine the resource consumption of a device's configuration is thus to load it into NSO and check the size of CDB before and after.
Forming a rough estimate of CDB resource consumption for planning can be helpful.
Divide your devices into categories. Get a rough measurement for an exemplar in each category, add a safety margin, e.g., double the resource consumption, and multiply by the number of devices in that category. Example:
A YANG model describes the input to services, and just like any other data in CDB, it consumes resources. Compared to the typical device configuration, where even small devices often have a few hundred lines of configuration, a small service might only have a handful of configurable inputs. Even extensive services rarely have more than 50 inputs.
When services write configuration, a reverse diff set is generated and saved as part of the service's private data. The more configuration a service writes, the larger its reverse diff set will be and, thus, the more resources it will consume. What appears as a small service with just a handful of inputs could consume considerable resources if it writes a lot of configuration. Similarly, we save a forward diff set by default, contributing to the size. Service metadata attributes, the back pointer list, and the recount are also added to the written configuration, which consumes some resources. For example, if 50 services all (share)create a node, there will be 50 backpointers in the database, which consumes some space.
As shown above, CDB scales linearly. Modern servers commonly support multiple terabytes of RAM, making it possible to support 50,000 - 100,000 such large router devices in NSO, well beyond the size of any currently existing network. However, beyond consuming RAM and disk space, the size of the CDB also affects the startup time of NSO and certain other operations like upgrades. In the previous example, 100 devices were used, which resulted in a CDB size of 461 MB on disk. Starting that on a standard laptop takes about 100 seconds. With 50,000 devices, CDB on-disk would be over 230 GB, which would take around 6 hours to load on the same laptop, if it had enough RAM. The typical server is considerably faster than the average laptop here, but loading a large CDB will take considerable time.
This also affects the sync/resync time in high availability setups, where the database size increases the data transfer needed.
A working system needs more than just storing the data. It must also be possible to use the devices and services and apply the necessary operations to these for the environment in which they operate. For example, it is common in brownfield environments to frequently run the sync-from
action. Most device-related operations, including sync-from
, can run concurrently across multiple devices in NSO. Syncing an extensive device configuration will take a few minutes or so. With 50,000 such large devices, we are looking at a total time of tens of hours or even days. Many environments require higher throughput, which could be handled using an LSA setup and spreading the devices over many NSO RFS nodes. sync-from is an example of an action that is easy to scale up and runs concurrently. For example, spreading the 50,000 devices over 5 NSO RFS nodes, each with 10,000 devices, would lead to a speedup close to 5x.
Using LSA, multiple Resource Facing Service (RFS) nodes can be employed to spread the devices across multiple NSO instances. This allows increasing the parallelism in sync-from and other operations, as described in Designing for Maximal Transaction Throughput, making it possible to scale to an almost arbitrary number of devices. Similarly, the services associated with each device are also spread across the RFS nodes, making it possible to operate on them in parallel. Finally, a top CFS node communicates with all RFS nodes, making it possible to administrate the entire setup as one extensive system.
For smooth operation of NSO instances consider all of the following:
Ensure there is enough RAM for NSO to run, with ample headroom.
create()
should normally run in a few hundred milliseconds, perhaps a few seconds for extensive services.
Consider splitting into smaller services.
Stacked services allow the composition of many smaller services into a larger service. A common best-practice design pattern is to have one Resource Facing Service (RFS) instance map to one device or network element.
Avoid conflicts between service instances.
Improves performance compared to a single large service for typical modifications.
Only services with changed input will have their create()
called.
A small change to the Customer Facing Service (CFS) that results in changes to a subset of the lower services avoids running create()
for all lower services.
No external calls or sync-from
in create()
code.
Use nano-services to do external calls asynchronously.
Never run sync-from
from create()
code.
Carefully consider the complexity of XPath constraints, in particular around lists.
Avoid XPath expressions with linear scaling or worse.
For example, avoid checking something for every element in a list, as performance will drop radically as the list grows.
XPath expressions involving nested lists or comparisons between lists can lead to quadratic scaling.
Make sure you have an efficient transaction ID method for NEDs.
In the worst case, the NED will compute the transaction ID based on a config hash, which means it will fetch the entire config to compute the transaction ID.
Enable commit queues and ensure transactions utilize as many CPU cores in a multi-core system as possible to increase transactional throughput.
Ensure there are enough file descriptors available.
In many Linux systems, the default limit is 1024.
If we, for example, assume that there are 4 northbound interface ports, CLI, RESTCONF, SNMP, JSON-RPC, or similar, plus a few hundred IPC ports, x 1024 == 5120. But one might as well use the next power of two, 8192, to be on the safe side.
While a minimal setup with a single CPU core and 1 GB of RAM is enough to start NSO for lab testing and development, it is recommended to have at least 2 CPU cores to avoid CPU contention and to run at least two transactions concurrently, and 4 GB of RAM to be able to load a few NEDs.
Contemporary laptops typically work well for NSO service development.
For production systems it is recommended to have at least 8 CPU cores and with as high clock frequency as possible. This ensures all NSO processes can run without contending for the same CPU cores. More CPU cores enable more transactions to run in parallel on the same processor. For higher-scale systems, an LSA setup should be investigated together with a technical expert. See Designing for Maximal Transaction Throughput.
NSO is not very disk intensive since CDB is loaded into RAM. On startup, CDB is read from disk into memory. Therefore, for fast startups of NSO, rapid backups, and other similar administrative operations, it is recommended to use a fast disk, for example, an NVMe SSD.
Network management protocols typically consume little network bandwidth. It is often less than 10 Mbps but can burst many times that. While 10 Gbps is recommended, 1 Gbps network connectivity will usually suffice. If you use High Availability (HA), the continuous HA updates are typically relatively small and do not consume a lot of bandwidth. A low latency, preferably below 1 ms and well within 10 ms, will significantly impact performance more than increasing bandwidth beyond 1 Gbps. 10 Gbps or more can make a difference for the initial synchronization in case the nodes are not in sync and avoid congestion when doing backups over the network or similar.
The in-memory portion of CDB needs to fit in RAM, and NSO needs working memory to process queries. This is a hard requirement. NSO can only function with enough memory. Less than the required amount of RAM does not lead to performance degradation - it prevents NSO from working. For example, if CDB consumes 50 GB, ensure you have at least 64 GB of RAM. There needs to be some headroom for RAM to allow temporary usage during, for example, heavy queries.
Swapping is a way to use disk space as RAM, and while it can make it possible to start an NSO instance that otherwise would not fit in RAM, it would lead to terrible performance. See Disable Memory Overcommit.
Provide at least 32GB of RAM and increase with the growth of CDB. As described in Scaling RAM and Disk, the consumption of memory and disk resources for devices and services will vary greatly with the type and size of the service or device.
Develop service packages to run user code.
When setting up an application project, there are several things to think about. A service package needs a service model, NSO configuration files, and mapping code. Similarly, NED packages need YANG files and NED code. We can either copy an existing example and modify that, or we can use the tool ncs-make-package
to create an empty skeleton for a package for us. The ncs-make-package
tool provides a good starting point for a development project. Depending on the type of package, we use ncs-make-package
to set up a working development structure.
As explained in NSO Packages, NSO runs all user Java code and also loads all data models through an NSO package. Thus a development project is the same as developing a package. Testing and running the package is done by putting the package in the NSO load-path and running NSO.
There are different kinds of packages; NED packages, service packages, etc. Regardless of package type, the structure of the package as well as the deployment of the package into NSO is the same. The script ncs-make-package
creates the following for us:
A Makefile to build the source code of the package. The package contains source code and needs to be built.
If it's a NED package, a netsim
directory that is used by the ncs-netsim
tool to simulate a network of devices.
If it is a service package, skeleton YANG and Java files that can be modified are generated.
In this section, we will develop an MPLS service for a network of provider edge routers (PE) and customer equipment routers (CE). The assumption is that the routers speak NETCONF and that we have proper YANG modules for the two types of routers. The techniques described here work equally well for devices that speak other protocols than NETCONF, such as Cisco CLI or SNMP.
We first want to create a simulation environment where ConfD is used as a NETCONF server to simulate the routers in our network. We plan to create a network that looks like this:
To create the simulation network, the first thing we need to do is create NSO packages for the two router models. The packages are also exactly what NSO needs to manage the routers.
Assume that the yang files for the PE routers reside in ./pe-yang-files
and the YANG files for the CE routers reside in ./ce-yang-files
The ncs-make-package
tool is used to create two device packages, one called pe
and the other ce
.
At this point, we can use the ncs-netsim
tool to create a simulation network. ncs-netsim
will use the Tail-f ConfD daemon as a NETCONF server to simulate the managed devices, all running on localhost.
The above command creates a network with 8 routers, 5 running the YANG models for a CE router and 3 running a YANG model for the PE routers. ncs-netsim
can be used to stop, start, and manipulate this network. For example:
ncs-setup
In the previous section, we described how to use ncs-make-package
and ncs-netsim
to set up a simulation network. Now, we want to use NCS to control and manage precisely the simulated network. We can use the ncs-setup
tool setup a directory suitable for this. ncs-setup
has a flag to set up NSO initialization files so that all devices in a ncs-netsim
network are added as managed devices to NSO. If we do:
The above commands, db, log, etc., directories and also create an NSO XML initialization file in ./NCS/ncs-cdb/netsim_devices_init.xml
. The init
file is important; it is created from the content of the netsim directory and it contains the IP address, port, auth credentials, and NED type for all the devices in the netsim environment. There is a dependency order between ncs-setup and ncs-netsim since ncs-setup creates the XML init file based on the contents in the netsim environment, therefor we must run the ncs-netsim create-network
command before we execute the ncs-setup
command. Once ncs-setup
has been run, and the init
XML file has been generated, it is possible to manually edit that file.
If we start the NSO CLI, we have for example :
If we take a look at the directory structure of the generated NETCONF NED packages, we have in ./ce
It is a NED package, and it has a directory called netsim
at the top. This indicates to the ncs-netsim
tool that ncs-netsim
can create simulation networks that contain devices running the YANG models from this package. This section describes the netsim
directory and how to modify it. ncs-netsim
uses ConfD to simulate network elements, and to fully understand how to modify a generated netsim
directory, some knowledge of how ConfD operates may be required.
The netsim
directory contains three files:
confd.conf.netsim
is a configuration file for the ConfD instances. The file will be /bin/sed
substituted where the following list of variables will be substituted for the actual value for that ConfD instance:
%IPC_PORT%
for /confdConfig/confdIpcAddress/port
%NETCONF_SSH_PORT%
- for /confdConfig/netconf/transport/ssh/port
%NETCONF_TCP_PORT%
- for /confdConfig/netconf/transport/tcp/port
%CLI_SSH_PORT%
- for /confdConfig/cli/ssh/port
%SNMP_PORT%
- for /confdConfig/snmpAgent/port
%NAME%
- for the name of the ConfD instance.
%COUNTER%
- for the number of the ConfD instance
The Makefile
should compile the YANG files so that ConfD can run them. The Makefile
should also have an install
target that installs all files required for ConfD to run one instance of a simulated network element. This is typically all fxs
files.
An optional start.sh
file where additional programs can be started. A good example of a package where the netsim component contains some additional C programs is the webserver
package in the NSO website example $NCS_DIR/web-server-farm
.
Remember the picture of the network we wish to work with, there the routers, PE and CE, have an IP address and some additional data. So far here, we have generated a simulated network with YANG models. The routers in our simulated network have no data in them, we can log in to one of the routers to verify that:
The ConfD devices in our simulated network all have a Juniper CLI engine, thus we can, using the command ncs-netsim cli [devicename]
, log in to an individual router.
To achieve this, we need to have some additional XML initializing files for the ConfD instances. It is the responsibility of the install
target in the netsim Makefile to ensure that each ConfD instance gets initialized with the proper init data. In the NSO example collection, the example $NCS_DIR/examples.ncs/mpls
contains precisely the two above-mentioned PE and CE packages but modified, so that the network elements in the simulated network get initialized properly.
If we run that example in the NSO example collection we see
A fully simulated router network loaded into NSO, with ConfD simulating the 7 routers.
With the scripting mechanism, an end-user can add new functionality to NSO in a plug-and-play-like manner. See Plug-and-play Scripting about the scripting concept in general. It is also possible for a developer of an NSO package to enclose scripts in the package.
Scripts defined in an NSO package work pretty much as system-level scripts configured with the /ncs-config/scripts/dir
configuration parameter. The difference is that the location of the scripts is predefined. The scripts directory must be named scripts
and must be located in the top directory of the package.
In this complete example examples.ncs/getting-started/developing-with-ncs/11-scripting
, there is a README
file and a simple post-commit script packages/scripting/scripts/post-commit/show_diff.sh
as well as a simple command script packages/scripting/scripts/command/echo.sh
.
So far we have only talked about packages that describe a managed device, i.e., ned
packages. There are also callback
, application
, and service
packages. A service package is a package with some YANG code that models an NSO service together with Java code that implements the service. See Developing NSO Services.
We can generate a service package skeleton, using ncs-make-package
, as:
Make sure that the package is part of the load path, and we can then create test service instances that do nothing.
The ncs-make-package
will generate skeleton files for our service models and for our service logic. The package is fully buildable and runnable even though the service models are empty. Both CLI and Webui can be run. In addition to this, we also have a simulated environment with ConfD devices configured with YANG modules.
Calling ncs-make-package
with the arguments above will create a service skeleton that is placed in the root in the generated service model. However, services can be augmented anywhere or can be located in any YANG module. This can be controlled by giving an argument --augment NAME
where NAME
is the path to where the service should be augmented, or in the case of putting the service as a root container in the service YANG this can be controlled by giving the argument --root-container NAME
.
Services created using ncs-make-package
will be of type list
. However, it is possible to have services that are of type container
instead. A container service needs to be specified as a presence container.
The service implementation logic of a service can be expressed using the Java language. For each such service, a Java class is created. This class should implement the create()
callback method from the ServiceCallback
interface. This method will be called to implement the service-to-device mapping logic for the service instance.
We declare in the component for the package, that we have a callback component. In the package-meta-data.xml
for the generated package, we have:
When the package is loaded, the NSO Java VM will load the jar files for the package, and register the defined class as a callback class. When the user creates a service of this type, the create()
method will be called.
In the following sections, we are going to show how to write a service application through several examples. The purpose of these examples is to illustrate the concepts described in previous chapters.
Service Model - a model of the service you want to provide.
Service Validation Logic - a set of validation rules incorporated into your model.
Service Logic - a Java class mapping the service model operations onto the device layer.
If we take a look at the Java code in the service generated by ncs-make-package
, first we have the create()
which takes four parameters. The ServiceContext instance is a container for the current service transaction, with this e.g. the transaction timeout can be controlled. The container service
is a NavuContainer
holding a read/write reference to the path in the instance tree containing the current service instance. From this point, you can start accessing all nodes contained within the created service. The root
container is a NavuContainer
holding a reference to the NSO root. From here you can access the whole data model of the NSO. The opaque
parameter contains a java.util.Properties
object instance. This object may be used to transfer additional information between consecutive calls to the create callback. It is always null in the first callback method when a service is first created. This Properties object can be updated (or created if null) but should always be returned.
The opaque object is extremely useful for passing information between different invocations of the create()
method. The returned Properties
object instance is stored persistently. If the create method computes something on its first invocation, it can return that computation to have it passed in as a parameter on the second invocation.
This is crucial to understand, the Mapping Logic fastmap mode relies on the fact that a modification of an existing service instance can be realized as a full deletion of what the service instance created when the service instance was first created, followed by yet another create, this time with slightly different parameters. The NSO transaction engine will then compute the minimal difference and send southbound to all involved managed devices. Thus a good service instance create()
method will - when being modified - recreate exactly the same structures it created the first time.
The best way to debug this and to ensure that a modification of a service instance really only sends the minimal NETCONF diff to the southbound managed devices, is to turn on NETCONF trace in the NSO, modify a service instance, and inspect the XML sent to the managed devices. A badly behaving create()
method will incur large reconfigurations of the managed devices, possibly leading to traffic interruptions.
It is highly recommended to also implement a selftest()
action in conjunction to a service. The purpose of the selftest()
action is to trigger a test of the service. The ncs-make-package
tool creates an selftest()
action that takes no input parameters and has two output parameters.
The selftest()
implementation is expected to do some diagnosis of the service. This can possibly include the use of testing equipment or probes.
The NSO Java VM logging functionality is provided using LOG4J. The logging is composed of a configuration file (log4j2.xml
) where static settings are made i.e. all settings that could be done for LOG4J (see https://logging.apache.org/log4j/2.x for more comprehensive log settings). There are also dynamically configurable log settings under /java-vm/java-logging
.
When we start the NSO Java VM in main()
the log4j2.xml
log file is parsed by the LOG4J framework and it applies the static settings to the NSO Java VM environment. The file is searched for in the Java CLASSPATH.
NSO Java VM starts several internal processes or threads. One of these threads executes a service called NcsLogger
which handles the dynamic configurations of the logging framework. When NcsLogger
starts, it initially reads all the configurations from /java-vm/java-logging
and applies them, thus overwriting settings that were previously parsed by the LOG4J framework.
After it has applied the changes from the configuration it starts to listen to changes that are made under /java-vm/java-logging
.
The LOG4J framework has 8 verbosity levels: ALL
,DEBUG
,ERROR
,FATAL
,INFO
,OFF
,TRACE
, and WARN
. They have the following relations: ALL
> TRACE
> DEBUG
> INFO
> WARN
> ERROR
> FATAL
> OFF
. This means that the highest verbosity that we could have is the level ALL
and the lowest is no traces at all, i.e., OFF
. There are corresponding enumerations for each LOG4J verbosity level in tailf-ncs.yang
, thus the NcsLogger
does the mapping between the enumeration type: log-level-type
and the LOG4J verbosity levels.
To change a verbosity level one needs to create a logger. A logger is something that controls the logging of certain parts of the NSO Java API.
The loggers in the system are hierarchically structured which means that there is one root logger that always exists. All descendants of the root logger inherit their settings from the root logger if the descendant logger doesn't overwrite its settings explicitly.
The LOG4J loggers are mapped to the package level in NSO Java API so the root logger that exits has a direct descendant which is the package: com
and it has in turn a descendant com.tailf
.
The com.tailf
logger has a direct descendant that corresponds to every package in the system for example: com.tailf.cdb, com.tailf.maapi
etc.
As in the default case, one could configure a logger in the static settings that is in a log4j2.properties
file this would mean that we need to explicitly restart the NSO Java VM,or one could alternatively configure a logger dynamically if an NSO restart is not desired.
Recall that if a logger is not configured explicitly then it will inherit its settings from its predecessors. To overwrite a logger setting we create a logger in NSO.
To create a logger, for example, let's say that one uses Maapi API to read and write configuration changes in NSO. We want to show all traces including INFO
level traces. To enable INFO traces for Maapi classes (located in the package com.tailf.maapi
) during runtime we start for example a CLI session and create a logger called com.tailf.maapi
.
When we commit our changes to CDB the NcsLogger will notice that a change has been made under /java-vm/java-logging
, it will then apply the logging settings to the logger com.tailf.maapi
that we just created. We explicitly set the INFO
level to that logger. All the descendants from com.tailf.maapi
will automatically inherit their settings from that logger.
So where do the traces go? The default configuration (in log4j2.properties
): appender.dest1.type=Console
the LOG4J framework forwards all traces to stdout/stderr.
In NSO, all stdout
/stderr
goes first through the service manager. The service manager has a configuration under /java-vm/stdout-capture
that controls where the stdout
/stderr
will end up.
The default setting is in a file called ./ncs-java-vm.log
.
It is important to consider that when creating a logger (in this case com.tailf.maapi
) the name of the logger has to be an existing package known by NSO classloader.
One could also create a logger named com.tailf
with some desired level. This would set all packages (com.tailf.*
) to the same level. A common usage is to set com.tailf
to level INFO
which would set all traces, including INFO
from all packages to level INFO
.
If one would like to turn off all available traces in the system (quiet mode), then configure com.tailf
or (com
) to level OFF
.
There are INFO
level messages in all parts of the NSO Java API. ERROR
levels when an exception occurs and some warning messages (level WARN
) for some places in packages.
There are also protocol traces between the Java API and NSO which could be enabled if we create a logger com.tailf.conf
with DEBUG
trace level.
When processing in the java-vm
fails, the exception error message is reported back to NCS. This can be more or less informative depending on how elaborate the message is in the thrown exception. Also, the exception can be wrapped one or several times with the original exception indicated as the root cause of the wrapped exception.
In debugging and error reporting, these root cause messages can be valuable to understand what actually happens in the Java code. On the other hand, in normal operations, just a top-level message without too many details is preferred. The exceptions are also always logged in the java-vm
log but if this log is large it can be troublesome to correlate a certain exception to a specific action in NCS. For this reason, it is possible to configure the level of details shown by NCS for an java-vm
exception. The leaf /ncs:java-vm/exception-error-message/verbosity
takes one of three values:
standard
: Show the message from the top exception. This is the default.
verbose
: Show all messages for the chain of cause exceptions, if any.
trace
: Show messages for the chain of cause exceptions with exception class and the trace for the bottom root cause.
Here is an example of how this can be used. In the web-site
service example, we try to create a service without the necessary pre-preparations:
NSO will, at first start to take the packages found in the load path and copy these into a directory under the supervision of NSO located at ./state/package-in-use
. Later starts of NSO will not take any new copies from the packages load-path
so changes will not take effect by default. The reason for this is that in normal operation, changing package definition as a side-effect of a restart is an unwanted behavior. Instead, these types of changes are part of an NSO installation upgrade.
During package development as opposed to operations, it is usually desirable that all changes to package definitions in the package load-path take effect immediately. There are two ways to make this happen. Either start ncs
with the --with-reload-packages
directive:
Or, set the environment variable NCS_RELOAD_PACKAGES
, for example like this:
It is a strong recommendation to use the NCS_RELOAD_PACKAGES
environment variable approach since it guarantees that the packages are updated in all situations.
It is also possible to request a running NSO to reload all its packages.
This request can only be performed in operational mode, and the effect is that all packages will be updated, and any change in YANG models or code will be effectuated. If any YANG models are changed an automatic CDB data upgrade will be executed. If manual (user code) data upgrades are necessary the package should contain an upgrade
component. This upgrade
component will be executed as a part of the package reload. See Writing an Upgrade Package Component for information on how to develop an upgrade component.
If the change in a package does not affect the data model or shared Java code, there is another command:
This will redeploy the private JARs in the Java VM for the Java package, restart the Python VM for the Python package, and reload the templates associated with the package. However, this command will not be sensitive to changes in the YANG models or shared JARs for the Java package.
By default, NCS will start the Java VM by invoking the command $NCS_DIR/bin/ncs-start-java-vm
That script will invoke:
The class NcsJVMLauncher
contains the main()
method. The started Java VM will automatically retrieve and deploy all Java code for the packages defined in the load path in the ncs.conf
file. No other specification than the package-meta-data.xml
for each package is needed.
In the NSO CLI, there exist several settings and actions for the NSO Java VM, if we do:
We see some of the settings that are used to control how the NSO Java VM runs. In particular, here we're interested in /java-vm/stdout-capture/file
The NSO daemon will, when it starts, also start the NSO Java VM, and it will capture the stdout output from the NSO Java VM and send it to the file ./logs/ncs-java-vm.log
. For more details on the Java VM settings, see the NSO Java VM.
Thus if we tail -f
that file, we get all the output from the Java code. That leads us to the first and most simple way of developing Java code. If we now:
Edit our Java code.
Recompile that code in the package, e.g cd ./packages/myrfs/src; make
Restart the Java code, either through telling NSO to restart the entire NSO Java VM from the NSO CLI (Note, this requires an env variable NCS_RELOAD_PACKAGES=true
):
Or instructing NSO to just redeploy the package we're currently working on.
We can then do tail -f logs/ncs-java-vm.log
to check for printouts and log messages. Typically there is quite a lot of data in the NSO Java VM log. It can sometimes be hard to find our own printouts and log messages. Therefore it can be convenient to use the command below which will make the relevant exception stack traces visible in the CLI.
It's also possible to dynamically, from the CLI control the level of logging as well as which Java packages that shall log. Say that we're interested in Maapi calls, but don't want the log cluttered with what is really NSO Java library internal calls. We can then do:
Now, considerably less log data will come. If we want these settings to always be there, even if we restart NSO from scratch with an empty database (no .cdb
file in ./ncs-cdb
) we can save these settings as XML, and put that XML inside the ncs-cdb
directory, that way ncs
will use this data as initialization data on a fresh restart. We do:
The ncs-setup --reset
command stops the NSO daemon and resets NSO back to factory defaults. A restart of NSO will reinitialize NSO from all XML files found in the CDB directory.
It's possible to tell NSO to not start the NSO Java VM at all. This is interesting in two different scenarios. First is if want to run the NSO Java code embedded in a larger application, such as a Java Application Server (JBoss), the other is when debugging a package.
First, we configure NSO to not start the NSO Java VM at all by adding the following snippet to ncs.conf
:
Now, after a restart or a configuration reload, no Java code is running, if we do:
We will see that the oper-status
of the packages is java-uninitialized
. We can also do:
This is expected since we've told NSO to not start the NSO Java VM. Now, we can do that manually, at the UNIX shell prompt.
So, now we're in a position where we can manually stop the NSO Java VM, recompile the Java code, and restart the NSO Java VM. This development cycle works fine. However, even though we're running the NSO Java VM standalone, we can still redeploy packages from the NSO CLI to reload and restart just our Java code, (no need to restart the NSO Java VM).
Since we can run the NSO Java VM standalone in a UNIX Shell, we can also run it inside Eclipse. If we stand in a NSO project directory, like NCS
generated earlier in this section, we can issue the command:
This will generate two files, .classpath
and .project
. If we add this directory to Eclipse as a File->New->Java Project, uncheck the Use the default location and enter the directory where the .classpath
and .project
have been generated. We're immediately ready to run this code in Eclipse. All we need to do is to choose the main()
routine in the NcsJVMLauncher
class.
The Eclipse debugger works now as usual, and we can at will, start and stop the Java code. One caveat here that is worth mentioning is that there are a few timeouts between NSO and the Java code that will trigger when we sit in the debugger. While developing with the Eclipse debugger and breakpoints, we typically want to disable all these timeouts.
First, we have three timeouts in ncs.conf
that matter. Copy the system ncs.conf
and set the three values of the following to a large value. See man page ncs.conf(5) for a detailed description of what those values are.
If these timeouts are triggered, NSO will close all sockets to the Java VM and all bets are off.
Edit the file and enter the following XML entry just after the Web UI entry.
Now, restart NCS.
We also have a few timeouts that are dynamically reconfigurable from the CLI. We do:
Then, to save these settings so that NCS will have them again on a clean restart (no CDB files):
The Eclipse Java debugger can connect remotely to an NSO Java VM and debug that NSO Java VM This requires that the NSO Java VM has been started with some additional flags. By default, the script in $NCS_DIR/bin/ncs-start-java-vm
is used to start the NSO Java VM. If we provide the -d
flag, we will launch the NSO Java VM with:
This is what is needed to be able to remotely connect to the NSO Java VM, in the ncs.conf
file:
Now, if we in Eclipse, add a debug configuration and connect to port 9000 on localhost, we can attach the Eclipse debugger to an already running system and debug it remotely.
ncs-project
An NSO project is a complete running NSO installation. It contains all the needed packages and the config data that is required to run the system.
By using the ncs-project
commands, the project can be populated with the necessary packages and kept updated. This can be used for encapsulating NSO demos or even a full-blown turn-key system.
For a developer, the typical workflow looks like this:
Create a new project using the ncs-project create
command.
Define what packages to use in the project-meta-data.xml
file.
Fetch any remote packages with the ncs-project update
command.
Prepare any initial data and/or config files.
Run the application.
Possibly export the project for somebody else to run.
Using the ncs-project create
command, a new project is created. The file project-meta-data.xml
should be updated with relevant information as will be described below. The project will also get a default ncs.conf
configuration file that can be edited to better match different scenarios. All files and directories should be put into a version control system, such as Git.
A directory called test_project
is created containing the files and directories of an NSO project as shown below:
The Makefile
contains targets for building, starting, stopping, and cleaning the system. It also contains targets for entering the CLI as well as some useful targets for dealing with any Git packages. Study the Makefile
to learn more.
Any initial CDB data can be put in the init_data
directory. The Makefile
will copy any files in this directory to the ncs-cdb
before starting NSO.
There is also a test directory created with a directory structure used for automatic tests. These tests are dependent on the test tool Lux.
To fill this project with anything meaningful, the project-meta-data.xml
file needs to be edited.
The project version number is configurable, the version we get from the create
command is 1.0. The description should also be changed to a small text explaining what the project is intended for. Our initial content of the project-meta-data.xml
may now look like this:
For this example, let's say we have a released package: ncs-4.1.2-cisco-ios-4.1.5.tar.gz
, a package located in a remote git repository foo.git
, and a local package that we have developed ourselves: mypack
. The relevant part of our project-meta-data.xml
file would then look like this:
By specifying netsim devices in the project-meta-data.xml
file, the necessary commands for creating the netsim configuration will be generated in the setup.mk
file that ncs-project update
creates. The setup.mk
file is included in the top Makefile
, and provides some useful make targets for creating and deleting our netsim setup.
When done editing the project-meta-data.xml
, run the command ncs-project update
. Add the -v
switch to see what the command does.
Answer yes
when asked to overwrite the setup.mk
. After this, a new runtime directory is created with NCS and simulated devices configured. You are now ready to compile your system with: make all
.
If you have a lot of packages, all located in the same Git repository, it is convenient to specify the repository just once. This can be done by adding a packages-store
section as shown below:
This means that if a package does not have a git repository defined, the repository and branch in the packages-store
is used.
If a package has specified that it is dependent on some other packages in its package-meta-data.xml
file, ncs-project update
will try to clone those packages from any of the specified packages-store
. To override this behavior, specify explicitly all packages in your project-meta-data.xml
file.
When the development is done the project can be bundled together and distributed further. The ncs-project
comes with a command, export
used for this purpose. The export
command creates a tarball of the required files and any extra files as specified in the project-meta-data.xml
file.
Developers are encouraged to distribute the project, either via some Source Code Management system, like Git or by exporting bundles using the export command.
When using export
, a subset of the packages should be configured for exporting. The reason for not exporting all packages in a project is if some of the packages are used solely for testing or similar. When configuring the bundle the packages included in the bundle are leafrefs to the packages defined at the root of the model, see the example below (The NSO Project YANG model). We can also define a specific tag, commit, or branch, even a different location for the packages, different from the one used while developing. For example, we might develop against an experimental branch of a repository, but bundle with a specific release of that same repository.
Bundled packages specified as of type file://
or url://
will not be built, they will simply be included as is by the export command.
The bundle also has a name and a list of included files. Unless another name is specified from the command line, the final compressed file will be named using the configured bundle name and project version.
We create the tar-ball by using the export
command:
There are two ways to make use of a bundle:
Together with the ncs-project create --from-bundle=<bundlefile>
command.
Extract the included packages using tar for manual installation in an NSO deployment.
In the first scenario, it is possible to create an NSO project, populated with the packages from the bundle, to create a ready-to-run NSO system. The optional init_data
part makes it possible to prepare CDB with configuration, before starting the system the very first time. The project-meta-data.xml
file will specify all the packages as local to avoid any dangling pointers to non-accessible git repositories.
The second scenario is intended for the case when you want to install the packages manually, or via a custom process, into your running NSO systems.
The switch --snapshot
will add a timestamp in the name of the created bundle file to make it clear that it is not a proper version numbered release.
To import our exported project we would do an ncs-project create
and point out where the bundle is located.
ncs-project
has a full set of man pages that describe its usage and syntax. Below is an overview of the commands which will be explained in more detail further down below.
project-meta-data.xml
FileThe project-meta-data.xml
file defines the project metadata for an NSO project according to the $NCS_DIR/src/ncs/ncs_config/tailf-ncs-project.yang
YANG model. See the tailf-ncs-project.yang
module where all options are described in more detail. To get an overview, use the IETF RFC 8340-based YANG tree diagram.
Below is a list of the settings in the tailf-ncs-project.yang
that is configured through the metadata file. A detailed description can be found in the YANG model.
The order of the XML entries in a project-meta-data.xml
must be in the same order as the model.
name
: Unique name of the project.
project-version
: The version of the project. This is for administrative purposes only.
packages-store
:
directory
: Paths for package dependencies.
git
repo
: Default git package repositories.
branch
, tag
, or commit
ID.
netsim
: List netsim devices used by the project to generate a proper Makefile running the ncs-project setup
script.
device
prefix
num-devices
bundle
: Information to collect files and packages to pack them in a tarball bundle.
name
: tarball filename.
includes
: Files to include.
package
: Packages to include (leafref to the package list below).
name
: Name of the package.
local, url, or git
: Where to get the package. The Git option needs a branch
, tag
, or commit
ID.
package
: Packages used by the project.
name
: Name of the package.
local
, url
, or git
: Where to get the package. The Git option needs a branch
, tag,
or commit
ID.
Manipulate NSO alarm table using the dedicated Alarm APIs.
This section focuses on how to manipulate the NSO alarm table using the dedicated Alarm APIs. Make sure that the concepts in the Alarm Manager introduction are well understood before reading this section.
The Alarm API provides a simplified way of managing your alarms for the most common alarm management use cases. The API is divided into a producer and a consumer part.
The producer part provides an alarm sink. Using an alarm sink, you can submit your alarms into the system. The alarms are then queued and fed into the NSO alarm list. You can have multiple alarm sinks active at any time.
The consumer part provides an Alarm Source. The alarm source lets you listen to new alarms and alarm changes. As with the producer side, you can have multiple alarm sources listening for new and changed alarms in parallel.
The diagram below shows a high-level view of the flow of alarms in and out of the system. Alarms are received, e.g. as SNMP notifications, and fed into the NSO Alarm List. At the other end, you subscribe for the alarm changes.
The producer part of the Alarm API can be used in the following modes:
Centralized Mode This is the preferred mode for NSO. In the centralized mode, we submit alarms to a central alarm writer that optimizes the number of sessions towards the CDB. The NSO Java VM will set up the centralized alarm sink at start-up which will be available for all Java components run by the NSO Java VM.
Local Mode In the local mode, we submit alarms directly into the CDB. In this case, each Alarm Sink keeps its own CDB session. This mode is the recommended mode for applications run outside of the NSO Java VM or Java components that have a specific need for controlling the CDB session.
The difference between the two modes is manifested by the way you retrieve the AlarmSink
instance to use for alarm submission. For submitting an alarm in centralized mode a prerequisite is that a central alarm sink has been set up within your JVM. For components in the NSO java VM, this is done for you. For applications outside of the NSO java VM that want to utilize the centralized mode, you need to get a AlarmSinkCentral
instance. This instance has to be started and the central will then execute in a separate thread. The application needs to maintain this instance and stop it when the application finishes.
The centralized alarm sink can then be retrieved using the default constructor in the AlarmSink
class.
When submitting an alarm using the local mode, you need a CDB socket and a Cdb
instance. The local mode alarm sink needs the Cdb
instance to write alarm info to CDB. The local alarm sink is retrieved using a constructor with a Cdb
instance as an argument.
The sink.submitAlarm(...)
method provided by the AlarmSink
instance can be used in both centralized and local mode to submit an alarm.
Below is an example showing how to submit alarms using the centralized mode, which is the normal scenario for components running inside the NSO Java VM. In the example, we create an alarm sink and submit an alarm.
In contrast to the alarm source, the alarm sink only operates in centralized mode. Therefore, before being able to consume alarms using the alarm API you need to set up a central alarm source. If you are executing components in the scope of the NSO Java VM this central alarm source is already set up for you.
You typically set up a central alarm source if you have a stand-alone application executing outside the NSO Java VM. Setting up a central alarm source is similar to setting up a central alarm sink. You need to retrieve a AlarmSourceCentral
. Your application needs to maintain this instance, which implies starting it at initialization and stopping it when the application finishes.
The central alarm source subscribes to changes in the alarm list and forwards them to the instantiated alarm sources. The alarms are broadcast to the alarm sources. This means that each alarm source will receive its own copy of the alarm.
The alarm source promotes two ways of receiving alarms:
Take Block execution until an alarm is received.
Poll Wait for the alarm with a timeout. If you do not receive an alarm within the stated time frame, the call will return.
As soon as you create an alarm source object, the alarm source object will start receiving alarms. If you do not poll or take any alarms from the alarm source object, the queue will fill up until it reaches the maximum number of queued alarms as specified by the alarm source central. The alarm source central will then start to drop the oldest alarms until the alarm source starts the retrieval. This only affects the alarm source that is lagging behind. Any other alarm sources that are active at the same time will receive alarms without discontinuation.
The NSO alarm manager is extendable. NSO itself has a number of built-in alarms. The user can add user-defined alarms. In the website example, we have a small YANG module that extends the set of alarm types.
We have in the module my-alarms.yang
the following alarm type extension:
The identity
statement in the YANG language is used for this type of constructs. To complete our alarm type extension we also need to populate configuration data related to the new alarm type. A good way to do that is to provide XML data in a CDB initialization file and place this file in the ncs-cdb
directory:
Another possibility of extension is to add fields to the existing NSO alarms. This can be useful if you want to add extra fields for attributes not directly supported by the NSO alarm list.
Below is an example showing how to extend the alarm and the alarm status.
One of the strengths of the NSO model structure is the correlation capabilities. Whenever NSO FASTMAP creates a new service it creates a back pointer reference to the service that caused the device modification to take place. NSO template-based services will generate these pointers by default. For Java-based services, back pointers are created when the createdShared
method is used. These pointers can be retrieved and used as input to the impacted objects parameter of a raised alarm.
The impacted objects of the alarm are the objects that are affected by the alarm i.e. depending on the alarming objects, or the root cause objects. For NSO, this typically means services that have created the device configuration. An impacted object should therefore point to a service that may suffer from this alarm.
The root cause object is another important object of the alarm. It describes the object that likely is the original cause of the alarm. Note that this is not the same thing as the alarming object. The alarming object is the object that raised the alarm, while the root cause object is the primary suspect for causing the alarm. In NSO, any object can raise alarms, it may be a service, a device, or something else.
API documentation for JSON-RPC API.
The contains all the details you need to understand the protocol but a short version is given here:
A request payload typically looks like this:
Where, the method
and params
properties are as defined in this manual page.
A response payload typically looks like this:
Or:
The request id
param is returned as-is in the response to make it easy to pair requests and responses.
The batch JSON-RPC standard is dependent on matching requests and responses by id
, since the server processes requests in any order it sees fit e.g.:
With a possible response like (first result for add
, the second result for subtract
):
The URL for the JSON-RPC API is `/jsonrpc`
. For logging and debugging purposes, you can add anything as a subpath to the URL, for example turning the URL into `/jsonrpc/<method>`
which will allow you to see the exact method in different browsers' Developer Tools - Network tab - Name column, rather than just an opaque jsonrpc
.
For brevity, in the upcoming descriptions of each method, only the input params
and the output result
are mentioned, although they are part of a fully formed JSON-RPC payload.
Authorization is based on HTTP cookies. The response to a successful call to login
would create a session, and set an HTTP-only cookie, and even an HTTP-only secure cookie over HTTPS, named sessionid
. All subsequent calls are authorized by the presence and the validity of this cookie.
The th
param is a transaction handle identifier as returned from a call to new_read_trans
or new_write_trans
.
The comet_id
param is a unique ID (decided by the client) that must be given first in a call to the comet
method, and then to upcoming calls which trigger comet notifications.
The handle
param needs to have a semantic value (not just a counter) prefixed with the comet
ID (for disambiguation), and overrides the handle that would have otherwise been returned by the call. This gives more freedom to the client and sets semantic handles.
The JSON-RPC specification defines the following error code
values:
-32700
- Invalid JSON was received by the server. An error occurred on the server while parsing the JSON text.
-32600
- The JSON sent is not a valid Request object.
-32601
- The method does not exist/is not available.
-32602
- Invalid method parameter(s).
-32603
- Internal JSON-RPC error.
-32000
to -32099
- Reserved for application-defined errors (see below).
To make server errors easier to read, along with the numeric code
, we use a type
param that yields a literal error token. For all application-defined errors, the code
is always -32000
. It's best to ignore the code
and just use the type
param.
Which results in:
The message
param is a free text string in English meant for human consumption, which is a one-to-one match with the type
param. To remove noise from the examples, this param is omitted from the following descriptions.
An additional method-specific data
param may be added to give further details on the error, most predominantly a reason
param which is also a free text string in English meant for human consumption. To remove noise from the examples, this param is omitted from the following descriptions. However any additional data
params will be noted by each method description.
All methods may return one of the following JSON RPC or application-defined errors, in addition to others, specific to each method.
NSO Web UI development information.
Web UI development is thought to be in the hands of the customer's front-end developers. They know best the requirements and how to fulfill those requirements in terms of aesthetics, functionality, and toolchain (frameworks, libraries, external data sources, and services).
NSO comes with a nortbound interface in the shape of a . This API is designed with Web UI applications in mind, and it complies with the while using HTTP/S as the transport mechanism.
The JSON-RPC API contains a handful of methods with well-defined input method
and params
, along with the output result
.
In addition, the API also implements a Comet model, as long polling, to allow the client to subscribe to different server events and receive event notifications about those events in near real-time.
You can call these from a browser via:
AJAX (e.g., XMLHTTPRequest, )
Or from the command line (e.g., , )
You can read in the JSON-RPC API section about all the available methods and their signatures, but here is a working example of how a common flow would look like:
Log in.
Create a new read transaction.
Read a value.
Create a new WebUI (read-write) transaction, in preparation for changing the value.
Change a value.
Commit (save) the changes.
Meanwhile, subscribe to changes and receive a notification.
In the release package, under ${NCS_DIR}/var/ncs/webui/example
, you will find the working code to run the example below.
In the example above describing a common flow, a reference is made to using a JSON-RPC client to make the RPC calls.
An example implementation of a JSON-RPC client, used in the example above:
In the example above describing a common flow, a reference is made to starting a Comet channel and subscribing to changes on a specific path.
An example implementation of a Comet client, used in the example above:
The Single Sign-On functionality enables users to log in via HTTP-based northbound APIs with a single sign-on authentication scheme, such as SAMLv2. Currently, it is only supported for the JSON-RPC northbound interface.
When enabled, the endpoint /sso
is made public and handles Single Sign-on attempts.
An example configuration for the cisco-nso-saml2-auth Authentication Package is presented below. Note that /ncs-config/aaa/auth-order
does not need to be set for Single Sign-On to work!
A client attempting single sign-on authentication should request the /sso
endpoint and then follow the continued authentication operation from there. For example, for cisco-nso-saml2-auth
, the client is redirected to an Identity Provider (IdP), which subsequently handles the authentication, and then redirects the client back to the /sso
endpoint to validate the authentication and set up the session.
Level | Description |
---|---|
Device Type | RAM | Disk | Number of Devices | Margin | Total RAM | Total Disk |
---|---|---|---|---|---|---|
Nano services use kickers to trigger executing state callback code, run templates, and execute actions according to a plan when pre-conditions are met. For more information see and .
JSON-RPC runs on top of the embedded web server (see ), which accepts HTTP and/or HTTPS.
When used in a browser, the JSON-RPC API does not accept cross-domain requests by default but can be configured to do so via the custom headers functionality in the embedded web server, or by adding a reverse proxy (see ).
You can read more about the command rules in .
Note: If the permission to delete is denied on a child, the 'warnings' array in the result will contain a warning 'Some elements could not be removed due to NACM rules prohibiting access.'. The delete
method will still delete as much as is allowed by the rules. See for more information about permissions and authorization.
Note: If this method is used for deletion and permission to delete is denied on a child, the 'warnings' array in the result will contain a warning ''Some elements could not be removed due to NACM rules prohibiting access.'. The delete will still delete as much as is allowed by the rules. See for more information about permissions and authorization.
Retrieves the result to a query (as chunks). For more details on queries, read the description of .
Note: If the permission to rollback is denied on some nodes, the 'warnings' array in the result will contain a warning 'Some changes could not be applied due to NACM rules prohibiting access.'. The install_rollback
will still rollback as much as is allowed by the rules. See for more information about permissions and authorization.
Note: If the permission to rollback is denied on some nodes, the 'warnings' array in the result will contain a warning 'Some changes could not be applied due to NACM rules prohibiting access.'. The load_rollback
will still rollback as much as is allowed by the rules. See for more information about permissions and authorization.
Note: Read about error recovery in for a more detailed explanation.
For Single Sign-On to work, the Package Authentication needs to be enabled, see ).
An embedded basic web server can be used to deliver static and Common Gateway Interface (CGI) dynamic content to a web client, such as a web browser. See for more information.
normal
Informational messages that highlight the progress of the system at a coarse-grained level. Used mainly to give a high-level overview. This is the default and the lowest verbosity level.
verbose
Detailed informational messages from the system. The various service and device phases and their duration will be traced. This is useful to get an overview of where time is spent in the system.
very-verbose
Very detailed informational messages from the system and its internal operations.
debug
The highest verbosity level with fine-grained informational messages usable for debugging the system and its internal operations. Internal system transactions as well as data kicker evaluation and CDB subscribers will traced. Setting this level could result in a large number of events being generated.
FTTB access switch
200KiB
25KiB
30000
100%
11718MiB
1464MiB
Mobile Base Station
120KiB
11KiB
15000
100%
3515MiB
322MiB
Business CPE
50KiB
4KiB
50000
50%
3662MiB
292MiB
PE / Edge Router
10MiB
1MiB
1000
25%
12GiB
1.2GiB
Total
20.6GiB
3.3GiB
Develop your own NEDs to integrate unsupported devices in your network.
NSO knows how to automatically communicate southbound to NETCONF and SNMP-enabled devices. By supplying NSO with the YANG models of a NETCONF device, NSO knows the data models of the device, and through the NETCONF protocol knows exactly how to manipulate the device configuration. This can be used for a NETCONF device such as a Juniper router, any device that uses ConfD as a management system, or any other device that runs a compliant NETCONF server. Similarly, by providing NSO with the MIBs for a device, NSO can automatically manage such a device.
Unfortunately, the majority of existing devices in current networks do not speak NETCONF and SNMP is usually mostly used to retrieve data from devices. By far the most common way to configure network devices is through the CLI. Management systems typically connect over SSH to the CLI of the device and issue a series of CLI configuration commands. Some devices do not even have a CLI, and thus SNMP, or even worse, various proprietary protocols, are used to configure the device.
NSO can speak southbound not only to NETCONF-enabled devices, but through the NED architecture it can speak to an arbitrary management interface. This is not entirely automatic like with NETCONF, and depending on the type of interface the device has for configuration, this may involve some programming. SNMP devices can be managed automatically, by supplying NSO with the MIBs for the device, with some additional declarative annotations. Devices with a Cisco-style CLI can be managed by writing YANG models describing the data in the CLI, and a relatively thin layer of Java code to handle the communication to the devices. Other types of devices require more coding.
The NSO architecture is described in the picture below, with a built-in NED for NETCONF, another built-in NED for SNMP, one NED for Cisco CLIs, and a generic NED for other protocols. The NED is the adaptation layer between the XML representation of the network configuration contained inside NSO and the wire protocol between NSO and managed devices. The NETCONF and SNMP NEDs are built in, the CLI NED is entirely model-driven, whereas the generic NED requires a Java program to translate operations on the NSO XML tree into configuration operations toward the device. Depending on what means are used to configure the device, this may be more or less complicated.
NSO can use SNMP to configure a managed device, under certain circumstances. SNMP in general is not suitable for configuration, and it is important to understand why:
In SNMP, the size of a SET request, which is used to write to a device, is limited to what fits into one UDP packet. This means that a large configuration change must be split into many packets. Each such packet contains some parameters to set, and each such packet is applied on its own by the device. If one SET request out of many fails, there is no abort command to undo the already applied changes, meaning that rollback is very difficult.
The data modeling language used in SNMP, SMIv2, does not distinguish between configuration objects and other writable objects. This means that it is not possible to retrieve only the configuration from a device without explicit, exact knowledge of all objects in all MIBs supported by the device.
SNMP supports only two basic operations, read and write. There is no protocol support for creating or deleting data. Such operations must be modeled in the MIBs, explicitly.
SMIv2 has limited support for semantic constraints in the data model. This means that it is difficult to know if a certain configuration will apply cleanly on a device. If it doesn't, rollback is tricky, as explained above.
Because of all of the above, ordering of SET requests becomes very important. If a device refuses to create some object A before another B, an SNMP manager must make sure to create B before creating A. It is also common that objects cannot be modified without first making them disabled or inactive. There is no standard way to do this, so again, different data models do this in different ways.
Despite all this, if a device can be configured over SNMP, NSO can use its built-in multilingual SNMP manager to communicate with the device. However, to solve the problems mentioned above, the MIBs supported by the device need to be carefully annotated with some additional information that instructs NSO on how to write configuration data to the device. This additional information is described in detail below.
To add a device, the following steps need to be followed. They are described in more detail in the following sections.
(See the Makefile snmp-ned/basic/packages/ex-snmp-ned/src/Makefile
, for an example of the below description.) Make sure that you have all MIBs available, including import dependencies, and that they contain no errors.
The ncsc --ncs-compile-mib-bundle
compiler is used to compile MIBs and MIB annotation files into NSO load files. Assuming a directory with input MIB files (and optional MIB annotation files) exist, the following command compiles all the MIBs in device-models
and writes the output to ncs-device-model-dir
.
The compilation steps performed by the ncsc --ncs-compile-mib-bundle
are elaborated below:
Transform the MIBs into YANG according to the IETF standardized mapping (https://www.ietf.org/rfc/rfc6643.txt). The IETF-defined mapping makes all MIB objects read-only over NETCONF.
Generate YANG deviations from the MIB, this makes SMIv2 read-write
objects YANG config true
as a YANG deviation.
Include the optional MIB annotations.
Merge the read-only YANG from step 1 with the read-write deviation from step 2.
Compile the merged YANG files into NSO load format.
These steps are illustrated in the figure below:
Finally make sure that the NSO configuration file points to the correct device model directory:
Each managed device is configured with a name, IP address, and port (161 by default), and the SNMP version to use (v1, v2c, or v3).
To minimize the necessary configuration, the authentication group concept (see Authentication Groups) is used also for SNMP. A configured managed device of the type snmp
refers to an SNMP authgroup. An SNMP authgroup contains community strings for SNMP v1 and v2c and USM parameters for SNMP v3.
In the example above, when NSO needs to speak to the device r3
, it sees that the device is of type snmp
, and that SNMP v3 should be used with authentication parameters from the SNMP authgroup my-authgroup
. This authgroup maps the local NSO user admin
to the USM user admin
, with explicit remote passwords given. These passwords will be localized for each SNMP engine that NSO communicates with. While the passwords above are shown encrypted, when you enter them in the CLI you write them in clear text. Note also that the remote engine ID is not configured; NSO performs a discovery process to find it automatically.
No NSO user other than admin
is mapped by the authgroup my-authgroup
for SNMP v3.
With SNMP, there is no standardized, generic way for an SNMP manager to learn which MIBs an SNMP agent implements. By default, NSO assumes that an SNMP device implements all MIBs known to NSO, i.e., all MIBs that have been compiled with the ncsc --ncs-compile-mib-bundle
command. This works just fine if all SNMP devices NSO manages are of the same type, and implement the same set of MIBs. But if NSO is configured to manage many different SNMP devices, some other mechanism is needed.
In NSO, this problem is solved by using MIB groups. MIB group is a named collection of MIB module names. A managed SNMP device can refer to one or more MIB groups. For example, below two MIB groups are defined:
The wildcard *
can be used only at the end of a string; it is thus used to define a prefix of the MIB module name. So the string SNMP*
matches all loaded standard SNMP modules, such as SNMPv2-MIB, SNMP-TARGET-MIB, etc.
An SNMP device can then be configured to refer to one or more of the MIB groups:
Most annotations for MIB objects are used to instruct NSO on how to split a large transaction into suitable SNMP SET requests. This step is not necessary for a default integration. But when for example ordering dependencies in the MIB is discovered it is better to add this as annotations and let NSO handle the ordering rather than leaving it to the CLI user or Java programmer.
In some cases, NSO can automatically understand when rows in a table must be created or deleted before rows in some other table. Specifically, NSO understands that if table B has an INDEX object in table A (i.e., B sparsely augments A), then rows in table B must be created after rows in table B, and vice versa for deletions. NSO also understands that if table B AUGMENTS table A, then a row in table A must be created before any column in B is modified.
However, in some MIBs, table dependencies cannot be detected automatically. In this case, these tables must be annotated with a sort-priority
. By default, all rows have sort-priority 0. If table A has a lower sort priority than table B, then rows in table A are created before rows in table B.
In some tables, existing rows cannot be modified unless the row is inactivated. Once inactive, the row can be modified and then activated again. Unfortunately, there is no formal way to declare this is SMIv2, so these tables must be annotated with two statements; ned-set-before-row-modification
and ned-modification-dependent
. The former is used to instruct NSO which column and which value is used to inactivate a row, and the latter is used on each column that requires the row to be inactivated before modification. ned-modification-dependent
can be used in the same table as ned-set-before-row-modification
, or in a table that augments or sparsely augments the table with ned-set-before-row-modification
.
By default, NSO treats a writable SMIv2 object as configuration, except if the object is of type RowStatus. Any writable object that does not represent configuration must be listed in a MIB annotation file when the MIB is compiled, with the "operational" modifier.
When NSO retrieves data from an SNMP device, e.g., when doing a sync from-device
, it uses the GET-NEXT request to scan the table for available rows. When doing the GET-NEXT, NSO must ask for an accessible column. If the row has a column of type RowStatus, NSO uses this column. Otherwise, if one of the INDEX objects is accessible, it uses this object. Otherwise, if the table has been annotated with ned-accessible-column
, this column is used. And, as a last resort, NSO does not indicate any column in the first GET-NEXT request, and uses the column returned from the device in subsequent requests. If the table has "holes" for this column, i.e., the column is not instantiated in all rows, NSO will not detect those rows.
NSO can automatically create and delete table rows for tables that use the RowStatus TEXTUAL-CONVENTION, defined in RFC 2580.
It is pretty common to mix configuration objects with non-configuration objects in MIBs. Specifically, it is quite common that rows are created automatically by the device, but then some columns in the row are treated as configuration data. In this case, the application programmer must tell NSO to sync from the device before attempting to modify the configuration columns, to let NSO learn which rows exist on the device.
Some SNMP agents require a certain order of row deletions and creations. By default, the SNMP NED sends all creates before deletes. The annotation ned-delete-before-create
can be used on a table entry to send row deletions before row creations, for that table.
Sometimes rows in some SNMP agents cannot be modified once created. Such rows can be marked with the annotation ned-recreate-when-modified
. This makes the SNMP NED to first delete the row, and then immediately recreate it with the new values.
A good starting point for understanding annotations is to look at the example in examples.ncs/snmp-ned
directory. The BASIC-CONFIG-MIB mib has a table where rows can be modified if the bscActAdminState
is set to locked. To have NSO do this automatically when modifying entries rather then leaving it to users an annotation file can be created. See the BASIC-CONFIG-MIB.miba
which contains the following:
This tells NSO that before modifying the bscActFlow
column set the bscActAdminState
to locked and restore the previous value after committing the set operation.
All MIB annotations for a particular MIB are written to a file with the file suffix .miba
. See mib_annotations(5) in manual pages for details.
Make sure that the MIB annotation file is put into the directory where all the MIB files are which is given as input to the ncsc --ncs-compile-mib-bundle
command
NSO can manage SNMP devices within transactions, a transaction can span Cisco devices, NETCONF devices, and SNMP devices. If a transaction fails NSO will generate the reverse operation to the SNMP device.
The basic features of the SNMP will be illustrated below by using the examples.ncs/snmp-ned
example. First, try to connect to all SNMP devices:
When NSO executes the connect request for SNMP devices it performs a get-next request with 1.1 as var-bind. When working with the SNMP NED it is helpful to turn on the NED tracing:
This creates a trace file named ned-devicename.trace
. The trace for the NCS connect
action looks like:
When looking at SNMP trace files it is useful to have the OBJECT-DESCRIPTOR rather than the OBJECT-IDENTIFIER. To do this, pipe the trace file to the smixlate
tool:
You can access the data in the SNMP systems directly (read-only and read-write objects):
NSO can synchronize all writable objects into CDB:
All the standard features of NSO with transactions and roll-backs will work with SNMP devices. The sequence below shows how to enable authentication traps for all devices as one transaction. If any device fails, NSO will automatically roll back the others. At the end of the CLI sequence a manual rollback is shown:
Each managed device in NSO has a device type, which informs NSO how to communicate with the device. The device type is one of netconf
, snmp
, cli
, or generic
. In addition, a special ned-id
identifier is needed.
NSO uses a technique called YANG Schema Mount, where all the data models from a device are mounted into the /devices
tree in NSO. Each set of mounted data models is completely separated from the others (they are confined to a "mount jail"). This makes it possible to load different versions of the same YANG module for different devices. The functionality is called Common Data Models (CDM).
In most cases, there are many devices running the same software version in the network managed by NSO, thus using the exact same set of YANG modules. With CDM, all YANG modules for a certain device (or family of devices) are contained in a NED package (or just NED for short). If the YANG modules on the device are updated in a backward-compatible way, the NED is also updated.
However, if the YANG modules on the device are updated in an incompatible way in a new version of the device's software, it might be necessary to create a new NED package for the new set of modules. Without CDM, this would not be possible, since there would be two different packages that contained different versions of the same YANG module.
When a NED is being built, its YANG modules are compiled to be mounted into the NSO YANG model. This is done by device compilation of the device's YANG modules and is performed via the ncsc
tool provided by NSO.
The ned-id identifier is a YANG identity, which must be derived from one of the pre-defined identities in tailf-ncs-ned.yang
:
A YANG model for devices handled by NED code needs to extend the base identity and provide a new identity that can be configured.
The Java NED code registers the identity it handles with NSO.
Similar to how we import device models for NETCONF-based devices, we use the ncsc --ncs-compile-bundle
command to import YANG models for NED-handled devices.
Once we have imported such a YANG model into NSO, we can configure the managed device in NSO to be handled by the appropriate NED handler (which is user Java code, more on that later)
When NSO needs to communicate southbound towards a managed device that is not of type NETCONF, it will look for a NED that has registered with the name of the identity, in the case above, the string ios
.
Thus, before NSO attempts to connect to a NED device before it tries to sync or manipulate the configuration of the device, a user-based Java NED code must have registered with the NSO service manager indicating which Java class is responsible for the NED with the string of the identity, in this case, the string ios
. This happens automatically when the NSO Java VM gets a instantiate-component
request for an NSO package component of type ned
.
The component Java class myNed
needs to implement either of the interfaces NedGeneric
or NedCli
. Both interfaces require the NED class to implement the following:
The above three callbacks are used by the NSO Java VM to connect the NED Java class with NSO. They are called at when the NSO Java VM receives the instantiate-component request
.
The underlying NedMux will start a number of threads, and invoke the registered class with other data callbacks as transactions execute.
Internally in NSO, a YANG module is identified by its namespace. Each such namespace must be unique. Without CDM, the namespace identifier would be the same as the XML namespace defined in the YANG module. But with CDM, the namespace is constructed from a mount ID and the XML namespace. The resulting namespace is sometimes referred to as a crunched namespace.
To implement CDM, NSO uses the YANG Schema Mount, defined in RFC 8528. This document introduces a mount point, under which YANG models are mounted. NSO defines two such mount points, in /devices/device/config
and /devices/device/live-status
. Under these mount points, all the device's YANG modules are mounted.
This implies that traversing a path in the schema that crosses a mount-point, signals that referencing a node under the mount point by using a module's name, prefix, or XML namespace may be ambiguous (since there may be multiple versions of the same module, with different definitions of the same node). To resolve this ambiguity, it is necessary to know the mount ID.
A NED package must define a NED ID that identifies the device type for the NED. In NSO, the NED ID is also the mount ID for the crunched namespaces.
This means that the NED ID must be unique for each NED and will serve the dual role of defining the device type and mount ID.
So, when traversing a mount-point, NSO will internally look up the ned-id for the specific device instance and resolve the ambiguities in the module name, prefix, or XML namespace. This way all user-code can and must use paths and XML namespaces just as before. There is no need for user code to ever handle crunched namespaces.
A NED has a version associated with it. A version consists of a sequence of numbers separated by dots (.
). The first two numbers define the major and minor version number, the third number defines the maintenance version number and any following numbers are patch release version numbers.
For instance, the 5.8.1 number indicates a maintenance release (1) on the minor release 5.8, and 5.8.1.1 indicates a patch release (1) on the maintenance release 5.8.1. Any incompatible YANG model change will require the major or minor version number to change, i.e. any 5.8.x version is to be backward compatible with the previous.
When a NED release is replaced with a later maintenance/patch release with the same major/minor version, NSO can do a simple data model upgrade to handle stored instance data in CDB. There is no risk that any data would be lost by this sort of upgrade.
On the other hand, when a NED is replaced by a new major/minor release this becomes a NED migration. These are nontrivial since the YANG model changes can result in loss of instance data if not handled correctly.
NED settings are YANG models augmented as config in NSO that controls the behavior of the NED. These settings are augmented under /devices/global-settings/ned-settings
, /devices/profiles/ned-settings
and /devices/device/ned-settings
. Traditionally, these NED settings have been accompanied by a when expression specifying the NED ID for which the settings are legal. With the introduction of CDM, such when expressions on specific NED IDs are not recommended since NED ID will change with NED releases.
Instead, there is a need to introduce a 'family' identity that becomes base for all NED releases for a certain family. The when
expressions can then use derived-from
syntax to be legal for all NED releases in the family.
As stated above schema traversal works as before until a mount-point is reached in the path. At that point, a lookup of the current mount-id (ned-id) is necessary to resolve any ambiguities in the module name, prefix, or XML namespace. Since the NED, by definition, works on devices under a NED any schema traversal in NED code falls under the latter case.
Pre CDM retrieving a CSNode from the Maapi Schema for a path was as simple as calling the findCSNode(Namespace, Path)
function.
With CDM the original findCSNode(Namespace, Path)
still exists for backward compatibility but in the NED code case all paths are under a mount-point and hence this function will return an error that a lookup cannot be performed. The reason for this is that a maapi call to the NSO service is necessary to retrieve the mount-id for the device. This is accomplished with a mount-id callback MountIdCb(Maapi, Th)
which takes a Maapi instance and optionally a current transaction.
NSO differentiates between managed devices that can handle transactions and devices that can not. This discussion applies regardless of NED type, i.e., NETCONF, SNMP, CLI, or Generic.
NEDs for devices that cannot handle abort, must indicate so in the reply of the newConnection()
method indicating that the NED wants a reverse diff in case of abort. Thus, NSO has two different ways to abort a transaction towards a NED, invoke the abort()
method with or without a generated reverse diff.
For non-transactional devices, we have no other way of trying out a proposed configuration change than to send the change to the device and see what happens.
The table below shows the 7 different data-related callbacks that could or must be implemented by all NEDs. It also differentiates between 4 different types of devices and what the NED must do in each callback for the different types of devices.
The table lists device types.
The following state diagram depicts the different states the NED code goes through in the life of a transaction.
The CLI NED is magic, it is an entirely model-driven way to CLI script towards all Cisco-like devices. The basic idea is that the Cisco CLI engine found in ConfD can be run in both directions.
A sequence of Cisco CLI commands can be turned into the equivalent manipulation of the internal XML tree that represents the configuration inside NSO/ConfD. This is the normal mode of operations of ConfD, run in Cisco mode.
A YANG model, annotated appropriately, will produce a Cisco CLI. The user can enter Cisco commands and ConfD will, using the annotated YANG model, parse the Cisco CLI commands and change the internal XML tree accordingly. Thus this is the CLI parser and interpreter. Model-driven.
The reverse operation is also possible. Given two different XML trees, each representing a configuration state, in the ConfD case it represents the configuration of a single device, i.e. the device using ConfD as a management framework, whereas in the NSO case, it represents the entire network configuration, we can generate the list of Cisco commands that would take us from one XML tree to another.
This technology is used by NSO to generate CLI commands southbound when we manage Cisco-like devices.
It will become clear later in the examples how the CLI engine is run in forward and also reverse mode. The key point though, is that the Cisco CLI NED Java programmer doesn't have to understand and parse the structure of the CLI, this is entirely done by the NSO CLI engine.
To implement a CLI NED, the following components are required:
A YANG data model that describes the CLI. An important development tool here is ConfD, the Tail-f on-device management toolkit. For NSO to manage a CLI device, it needs a YANG file with exactly the right annotations to produce precisely the CLI of the managed device. In the NSO example collection, we have a few examples of annotated YANG models that render different variants of Cisco CLI. See for example $NCS_DIR/packages/neds/dell-ftos
and $NCS_DIR/packages/neds/cisco-nx
.
Thus, to create annotated YANG files for a device with a Cisco-like CLI, the work procedure is thus to run ConfD and write a YANG file which renders the correct CLI. This procedure is well described in the ConfD user guide documentation.
Furthermore, this YANG model must declare an identity with ned:cli-ned-id
as base.
The next thing we need is a Java class that implements the NED. This is typically not a lot of code, and the existing example NED Java classes are easily extended and modified to fit other needs. The most important point of Java NED class code though is that the code can be oblivious of the actual CLI commands sent and received.
Java CLI NED code must implement the CliNed
interface.
Thus the Java NED class has the following responsibilities.
It must implement the identification callbacks, i.e. modules()
, type()
, and identity()
.
It must implement the connection-related callback methods newConnection()
, isConnection()
, and reconnect()
.
NSO will invoke the newConnection()
when it requires a connection to a managed device. It is the responsibility of the newConnection()
method to connect to the device, figure out exactly what type of device it is, and return an array of NedCapability
objects.\
This is very much in line with how a NETCONF connect works and how the NETCONF client and server exchange hello messages.
Finally, the NED code must implement a series of data methods. For example, the method void prepare(NedWorker w, String data)
get a String object which is the set of Cisco CLI commands it shall send to the device.
In the other direction, when NSO wants to collect data from the device, it will invoke void show(NedWorker w, String toptag)
for each tag found at the top of the data model(s) loaded for that device. For example, if the NED gets invoked with show(w, "interface")
, it's responsibility is to invoke the relevant show configuration command for "interface"
, i.e., show running-config interface
over the connection to the device, and then dumbly reply with all the data the device replies with. NSO will parse the output data and feed it into its internal XML trees.
NSO can order the showPartial()
to collect part of the data if the NED announces the capability http://tail-f.com/ns/ncs-ned/show-partial?path-format=FORMAT in which FORMAT is of the following:
key-path
: support regular instance keypath format.
top-tag
: support top tags under the /devices/device/config
tree.
cmd-path-full
: support Cisco's CLI edit path with instances.
path-modes-only
: support Cisco CLI mode path.
cmd-path-modes-only-existing
: same as path-mode-only
but NSO only supplies the path mode of existing nodes.
As described in previous sections, the CLI NEDs are almost programming-free. The NSO CLI engine takes care of parsing the stream of characters that come from "show running-config [toptag]" and also automatically produces the sequence of CLI commands required to take the system from one state to another.
A generic NED is required when we want to manage a device that neither speaks NETCONF or SNMP nor can be modeled so that ConfD - loaded with those models - gets a CLI that looks almost/exactly like the CLI of the managed device. For example, devices that have other proprietary CLIs, devices that can only be configured over other protocols such as REST, Corba, XML-RPC, SOAP, other proprietary XML solutions, etc.
In a manner similar to the CLI NED, the Generic NED needs to be able to connect to the device, return the capabilities, perform changes to the device, and finally, grab the entire configuration of the device.
The interface that a Generic NED has to implement is very similar to the interface of a CLI NED. The main differences are:
When NSO has calculated a diff for a specific managed device, it will for CLI NEDS also calculate the exact set of CLI commands to send to the device, according to the YANG models loaded for the device. In the case of a generic NED, NSO will instead send an array of operations to perform towards the device in the form of DOM manipulations. The generic NED class will receive an array of NedEditOp
objects. Each NedEditOp
object contains:
The operation to perform, i.e. CREATED, DELETED, VALUE_SET, etc.
The keypath to the object in case.
An optional value
When NSO wants to sync the configuration from the device to NSO, the CLI NED only has to issue a series of show running-config [toptag]
commands and reply with the output received from the device. A generic NED has to do more work. It is given a transaction handler, which it must attach to over the Maapi interface. Then the NED code must - by some means - retrieve the entire configuration and write into the supplied transaction, again using the Maapi interface.
Once the generic NED is implemented, all other functions in NSO work precisely in the same manner as with NETCONF and CLI NED devices. NSO still has the capability to run network-wide transactions. The caveat is that to abort a transaction towards a device that doesn't support transactions, we calculate the reverse diff and send it to the device, i.e. we automatically calculate the undo operations.
Another complication with generic NEDs is how the NED class shall authenticate towards the managed device. This depends entirely on the protocol between the NED class and the managed device. If SSH is used to a proprietary CLI, the existing authgroup structure in NSO can be used as is. However, if some other authentication data is needed, it is up to the generic NED implementer to augment the authgroups in tailf-ncs.yang
accordingly.
We must also configure a managed device, indicating that its configuration is handled by a specific generic NED. Below we see that the NED with identity xmlrpc
is handling this device.
The example examples.ncs/generic-ned/xmlrpc-device
in the NSO examples collection implements a generic NED that speaks XML-RPC to 3 HTTP servers. The HTTP servers run the apache XML-RPC server code and the NED code manipulates the 3 HTTP servers using a number of predefined XML RPC calls.
A good starting point when we wish to implement a new generic NED is the ncs-make-package --generic-ned-skeleton ...
command, which is used to generate a skeleton package for a generic NED.
NSO ships with several CLI NED examples. A good starting point is $NCS_DIR/packages/neds/cisco-ios
which contains that allows NSO to control Cisco IOS/Catalyst routers.
Implementing a CLI NED is almost entirely a YANG model activity. The tool to use while developing the YANG model is ConfD. The task is to write a YANG model, that when run with ConfD, make ConfD produce a CLI that is as close as possible to the target device, in this case, a Cisco IOS router.
The ConfD example found under $CONFD_DIR/examples.confd/cli/c7200
doesn't cover the entire Cisco c7200 router. It only covers certain aspects of the device. This is important, to have NSO manage a device with a Cisco-like CLI we do not have to model the entire device, we only need to cover the commands that we intend to use. When the show()
callback issues its show running-config [toptag]
command, and the device replies with data that is fed to NSO, NSO will ignore all command dump output that is not covered by the loaded YANG models.
Thus, whichever Cisco-like device we wish to manage, we must first have YANG models that cover all aspects of the device we wish to use from NSO. Tailf ships various YANG models covering different variants of Cisco routers and switches in the NSO example collection. Either of these is a good starting point. Once we have a YANG model, we load it into NSO, modify the example CLI NED class to return the NedCapability
list of the device.
The NED code gets to see all data that goes from and to the device. If it's impossible or too hard to get the YANG model exactly right for all commands, a last resort is to let the NED code modify the data inline. Hopefully, this shall never be necessary.
NSO can order the showPartial()
to collect part of the data if the NED announces the capability http://tail-f.com/ns/ncs-ned/show-partial?path-format=key-path
.
A generic NED always requires more work than a CLI NED. The generic NED needs to know how to map arrays of NedEditOp
objects into the equivalent reconfiguration operations on the device. Depending on the protocol and configuration capabilities of the device, this may be arbitrarily difficult.
Regardless of the device, we must always write a YANG model that describes the device. The array of NedEditOp
objects that the generic NED code gets exposed to is relative the YANG model that we have written for the device. Again, this model doesn't necessarily have to cover all aspects of the device.
Often a useful technique with generic NEDs can be to write a pyang plugin to generate code for the generic NED. Again, depending on the device it may be possible to generate Java code from a pyang plugin that covers most or all aspects of mapping an array of NedEditOp
objects into the equivalent reconfiguration commands for the device.
Pyang is an extensible and open-source YANG parser (written by Tail-f) available at http://www.yang-central.org
. pyang is also part of the NSO release. A number of plugins are shipped in the NSO release, for example $NCS_DIR/lib/pyang/pyang/plugins/tree.py
is a good plugin to start with if we wish to write our own plugin.
$NCS_DIR/examples.ncs/generic-ned/xmlrpc-device
is a good example to start with if we wish to write a generic NED. It manages a set of devices over the XML-RPC protocol. In this example, we have:
Defined a fictitious YANG model for the device.
Implemented an XML-RPC server exporting a set of RPCs to manipulate that fictitious data model. The XML-RPC server runs the apache org.apache.xmlrpc.server.XmlRpcServer
Java package.
Implemented a Generic NED which acts as an XML-RPC client speaking HTTP to the XML-RPC servers.
The example is self-contained, and we can, using the NED code, manipulate these XML-RPC servers in a manner similar to all other managed devices.
As it was mentioned earlier the NedEditOp
objects are relative to the YANG model of the device, and they are to be translated into the equivalent reconfiguration operations on the device. Applying reconfiguration operations may only be valid in a certain order.
For Generic NEDs, NSO provides a feature to ensure dependency rules are being obeyed when generating a diff to commit. It controls the order of operations delivered in the NedEditOp
array. The feature is activated by adding the following option to package-meta-data.xml
:
When the ordered-diff
flag is set, the NedEditOp
objects follow YANG schema order and consider dependencies between leaf nodes. Dependencies can be defined using leafrefs and the tailf:cli-diff-after
, tailf:cli-diff-create-after
, tailf:cli-diff-modify-after
, tailf:cli-diff-set-after
, tailf:cli-diff-delete-after
YANG extensions. Read more about the above YANG extensions in the Tail-f CLI YANG extensions man page.
A device we wish to manage using a NED usually has not just configuration data that we wish to manipulate from NSO, but the device usually has a set of commands that do not relate to configuration.
The commands on the device we wish to be able to invoke from NSO must be modeled as actions. We model this as actions and compile it using a special ncsc
command to compile NED data models that do not directly relate to configuration data on the device.
The NSO example $NCS_DIR/examples.ncs/generic-ned/xmlrpc-device
contains an example where the managed device, a fictitious XML-RPC device contains a YANG snippet :
When that action YANG is imported into NSO it ends up under the managed device. We can invoke the action on the device as :
The NED code is obviously involved here. All NEDs must always implement:
The command()
method gets invoked in the NED, the code must then execute the command. The input parameters in the params
parameter correspond to the data provided in the action. The command()
method must reply with another array of ConfXMLParam
objects.
The above code is fake, on a real device, the job of the command()
method is to establish a connection to the device, invoke the command, parse the output, and finally reply with an ConfXMLParam
array.
The purpose of implementing NED commands is usually that we want to expose device commands to the programmatic APIs in the NSO DOM tree.
NED devices have runtime data and statistics. The first part is being able to collect non-configuration data from a NED device is to model the statistics data we wish to gather. In normal YANG files, it is common to have the runtime data nested inside the configuration data. In gathering runtime data for NED devices we have chosen to separate configuration data and runtime data. In the case of the archetypical CLI device, the show running-config ...
and friends are used to display the running configuration of the device whereas other different show ...
commands are used to display runtime data, for example show interfaces
, show routes
. Different commands for different types of routers/switches and in particular, the different tabular output formats for different device types.
To expose runtime data from a NED controlled device, regardless of whether it's a CLI NED or a Generic NED, we need to do two things:
Write YANG models for the aspects of runtime data we wish to expose northbound in NSO.
Write Java NED code that is responsible for collecting that data.
The NSO NED for the Avaya 4k device contains a data model for some real statistics for the Avaya router and also the accompanying Java NED code. Let's start to take a look at the YANG model for the stats portion, we have:
It's a config false;
list of counters per interface. We compile the NED stats module with the --ncs-compile-module
flag or with the --ncs-compile-bundle
flag. It's the same non-config
module that contains both runtime data as well as commands and rpcs.
The config false;
data from a module that has been compiled with the --ncs-compile-module
flag will end up mounted under /devices/device/live-status
tree. Thus running the NED towards a real router we have:
It is the responsibility of the NED code to populate the data in the live device tree. Whenever a northbound agent tries to read any data in the live device tree for a NED device, the NED code is invoked.
The NED code implements an interface called, NedConnection
This interface contains:
This interface method is invoked by NSO in the NED. The Java code must return what is requested, but it may also return more. The Java code always needs to signal errors by invoking NedWorker.error()
and success by invoking NedWorker.showStatsPathResponse()
. The latter function indicates what is returned, and also how long it shall be cached inside NSO.
The reason for this design, is that it is common for many show
commands to work on for example an entire interface, or some other item in the managed device. Say that the NSO operator (or maapi code) invokes:
requesting a single leaf, the NED Java code can decide to execute any arbitrary show
command towards the managed device, parse the output, and populate as much data as it wants. The Java code also decides how long the NSO shall cache the data.
When the showStatsPath()
is invoked, the NED should indicate the state/value of the node indicated by the path (i.e. if a leaf was requested, the NED should write the value of this leaf to the provided transaction handler (th) using MAAPI, or indicate its absence as described below; if a list entry or a presence container was requested then the NED should indicate presence or absence of the element, if the whole list is requested then the NED should populate the keys for this list). Often requesting such data from the actual device will give the NED more data than specifically requested, in which case the worker is free to write other values as well. The NED is not limited to populating the subtree indicated by the path, it may also write values outside this subtree. NSO will then not request those paths but read them directly from the transaction. Different timeouts can be provided for different paths.
If a leaf does not have a value or does not exist, the NED can indicate this by returning a TTL for the path to the leaf, without setting the value in the provided transaction. This has changed from earlier versions of NSO. The same applies to optional containers and list entries. If the NED populates the keys for a certain list (both when it is requested to do so or when it decided to do so because it has received this data from the device), it should set the TTL value for the list itself to indicate the time the set of keys should be considered up to date. It may choose to provide different TTL values for some or all list entries, but it is not required to do so.
Sometimes we wish to use a different protocol to collect statistics from the live tree than the protocol that is used to configure a managed device. There are many interesting use cases where this pattern applies. For example, if we wish to access SNMP data as statistics in the live tree on a Juniper router, or alternatively if we have a CLI NED to a Cisco-type device, and wish to access statistics in the live tree over SNMP.
The solution is to configure additional protocols for the live tree. We can have an arbitrary number of NEDs associated with statistics data for an individual managed device.
The additional NEDs are configured under /devices/device/live-status-protocol
.
In the configuration snippet below, we have configured two additional NEDs for statistics data.
One important task when implementing a NED of any type is to make it mimic the devices handling of default values as closely as possible. Network equipment can typically deal with default values in many different ways.
Some devices display default values on leafs even if they have not been explicitly set. Others use trimming, meaning that if a leaf is set to its default value it will be 'unset' and disappear from the devices configuration dump.
It is the responsibility of the NED to make the NSO aware of how the device handles default values. This is done by registering a special NED Capability entry with the NSO. Two modes are currently supported by the NSO: trim
and report-all
.
This is the typical behavior of a Cisco IOS device. The simple YANG code snippet below illustrates the behavior. A container with a boolean leaf. Its default value is true
.
Try setting the leaf to true in NSO and commit. Then, compare the configuration:
The result shows that the configurations differ. The reason is that the device does not display the value of the leaf 'enabled'. It has been trimmed since it has its default value. The NSO is now out of sync with the device.
To solve this issue, make the NED tell the NSO that the device is trimming default values. Register an extra NED Capability entry in the Java code.
Now, try the same operation again:
The NSO is now in sync with the device.
Some devices display default values for leafs even if they have not been explicitly set. The simple YANG code below will be used to illustrate this behavior. A list containing a key and a leaf with a default value.
Try creating a new list entry in NSO and commit. Then compare the configuration:
The result shows that the configurations differ. The NSO is out of sync. This is because the device displays the default value of the 'threshold' leaf even if it has not been explicitly set through the NSO.
To solve this issue, make the NED tell the NSO that the device is reporting all default values. Register an extra NED Capability entry in the Java code.
Now, try the same operation again:
The NSO is now in sync with the device.
When implementing a NED it sometimes happens that the device has a really tricky behavior regarding how different parts of the configuration are related to each other. This is typically so complex making it impossible to model it in YANG.
Examples of such are:
A device that alters unrelated configuration. For instance, if a value of leaf A is changed through NSO the device will also automatically modify the value of leaf B.
A device that creates additional configuration. For instance, if a new entry in list A is created through NSO the device will also automatically create an entry in the sub-list B.
Both these cases will result in out-of-sync issues in the NSO.
One fairly straightforward way to solve this is by using set hooks in the NED. A set hook is a callback routine in Java that is mapped to something in the YANG model. This can for instance be a certain leaf or list in the model. The set hook can be configured to be called upon different operations. Typically this involves create, set, or delete operations.
Example: Using a set hook to create additional configuration
Assume a device that creates additional configuration as described above. The YANG code snippet below will be used to illustrate this.
Try creating a new list entry in NSO and commit. Then compare the configuration:
The device has automatically created the sub list 'default' when it created the list mylist
. The result is that NSO is now out of sync with the device.
The solution is to implement a set hook in the NED that makes the NSO mimic the device properly. In this case, it shall create an entry named 'default' in the sub-list B each time it creates an entry in list A.
The Java implementation of the set hook would look something like this:
Finally, the YANG model is extended with an extra annotation:
Now, try the same operation again. Create a new list entry in NSO and commit. Then compare the configuration:
The NSO has now automatically created the default
entry in sub-list B the same way as the device does. The NSO will now be in sync with the device.
The possibility to do a dry-run on a transaction is a feature in NSO that allows examination of the changes to be pushed out to the managed devices in the network. The output can be produced in different formats, namely cli
, xml
, and native
. To produce a dry-run in the native output format, NSO needs to know the exact syntax used by the device, and the task of converting the commands or operations produced by the NSO into the device-specific output belongs the corresponding NED. This is the purpose of the prepareDry()
callback in the NED interface.
To be able to invoke a callback an instance of the NED object needs to be created first. There are two ways to instantiate a NED:
newConnection()
callback that tells the NED to establish a connection to the device which can later be used to perform any action such as show configuration, apply changes, or view operational data as well as produce dry-run output.
Optional initNoConnect()
callback that tells the NED to create an instance that would not need to communicate with the device, and hence must not establish a connection or otherwise communicate with the device. This instance will only be used to calculate dry-run output. It is possible for a NED to reject the initNoConnect() request if it is not able to calculate the dry-run output without establishing a connection to the device, for example, if a NED is capable of managing devices with different flavors of syntax and it is not known at the moment which syntax is used by this particular device.
The following state diagram displays NED states specific to the dry-run scenario.
NED packages should follow some naming conventions. A package is a directory where the package name is the same as the directory name. At the top level of this directory, a file called package-meta-data.xml
must exist. The package name in that file should follow <vendor>-<ned_name>-<ned_version>
for example, cisco-iosxr-cli-7.29
. A package may also be a tar archive with the same directory layout. The tar archive can be either uncompressed with suffix .tar
, or gzip-compressed with suffix .tar.gz
or .tgz
. The archive file should also follow some naming conventions, it should be named by ncs-<ncs_version>-<vendor>-<ned_name>-<ned_version>.<suffix>
, for example, ncs-5.4-cisco-iosxr-7.29.1.tar.gz
The NED name is expected to be two words (no dashes within the words) separated by a dash, for example, cisco-iosxr
. It may also include NED type at the end, for example, cisco-iosxr_netconf
.
The YANG modeling language supports the notion of a module revision
. It allows users to distinguish between different versions of a module, so the module can evolve over time. If you wish to use a new revision of a module for a managed device, for example, to access new features, you generally need to create a new NED.
When a model evolves quickly and you have many devices that require the use of a lot of different revisions, you will need to maintain a high number of NEDs, which are mostly the same. This can become especially burdensome during NSO version upgrades, when all NEDs may need to be recompiled.
When a YANG module is only updated in a backward-compatible way (following the upgrade rules in RFC6020 or RFC7950), the NSO compiler, ncsc
, allows you to pack multiple module revisions into the same package. This way, a single NED with multiple device model revisions can be used, instead of multiple NEDs. Based on the capabilities exchange, NSO will then use the correct revision for communication with each device.
However, there is a major downside to this approach. While the exact revision is known for each communication session with the managed device, the device model in NSO does not have that information. For that reason, the device model always uses the latest revision. When pushing configuration to a device that only supports an older revision, NSO silently drops the unsupported parts. This may have surprising results, as the NSO copy can contain configuration that is not really supported on the device. Use the no-revision-drop commit parameter when you want to make sure you are not committing a config that is not supported by a device.
If you still wish to use this functionality, you can create a NED package with the ncs-make-package --netconf-ned
command as you would otherwise. However, the supplied source YANG directory should contain YANG modules with different revisions. The files should follow the module-or-submodule-name@revision-date
.yang
naming convention, as specified in the RFC6020. Some versions of the compiler require you to use the --no-fail-on-warnings
option with the ncs-make-package
command or the build process may fail.
The examples.ncs/development-guide/ned-upgrade/yang-revision
example shows how you can perform a YANG model upgrade. The original, 1.0 version of the router NED uses the router@2020-02-27.yang
YANG model. First, it is updated to the version 1.0.1 router@2020-09-18.yang
using a revision merge approach. This is possible because the changes are backward-compatible.
In the second part of the example, the updates in router@2022-01-25.yang
introduce breaking changes, therefore the version is increased to 1.1 and a different ned-id is assigned to the NED. In this case, you can't use revision merge and the usual NED migration procedure is required.
Before a NETCONF-capable device can be managed by NSO, a corresponding NETCONF NED needs to be loaded. While no code needs to be written for such a NED, it needs to contain YANG data models for this kind of devices.
While in some cases the YANG models may be provided by the device's vendor, devices that implement RFC 6022 YANG Module for NETCONF monitoring are able to provide their YANG models using the functionality described in this RFC.
The NETCONF NED builder functionality helps the NSO developer to onboard new kind of devices by fetching the YANG models from a reference device of this kind and building a NETCONF NED of them.
The following steps need to be performed to build a new NED using NETCONF NED builder functionality:
Configure the reference device in NSO under /devices/device
list. Use the base netconf
NED ID for this device as there is no NED ID specific to this kind of device defined in NSO yet.
Create a new NETCONF NED builder project. To access the NETCONF NED builder data model devtools
session parameter should be set to true
using devtools true
(C-style) / set devtools true
(J-style) command in the CLI. The project's family name typically consists of the vendor and the name of the OS the device is running (for example, cisco-iosxr) and the major version is the NED version which may or may not reflect the actual device's OS version. The idea is that the NED major version only needs to be changed if backward-incompatible changes have been introduced in the device's YANG model compared to the previous NED version. The rules for backwards compatibility between YANG modules are described in RFC 6020 YANG section 10 Updating a Module.
Running the fetch-module-list
initiates NETCONF connection to the device and collects the list of YANG modules supported by the device which is stored in the module
list under the NETCONF NED builder project.
Once the list of modules has been collected the developer needs to decide which YANG modules need to be included in the NED and indicate these modules by using select
action on these modules in the module
list. By default, all the modules are deselected. Once a module is selected it will be downloaded from the device in the background.
Once the modules are selected and successfully downloaded the developer may initiate a NED build using build-ned
action. See Building the NED.
A successfully built NED may be exported in form of a tar file using the export-ned
action. The tar file name is constructed according to the naming convention ( ncs-<ncs-version>-<ned-family>-nc-<ned-version>.tar.gz
) and the user chooses the directory the file needs to be created in. The user must have write access to the directory.
An alternative to letting NSO build the NED is to create a development version of the NED using make-development-ned
action. This is useful if, for example, there is an intention to maintain this version of the NED with the upcoming minor (backward compatible) YANG model revisions, if the NED needs to support creating a NETSIM device, or if the YANG models provided by the device need to be manually edited due to errors in compilation. A development version of the NED is not built but instead contains the Makefile
with the rules to build the NED. Essentially, it is the same package that would be created using the ncs-make-package
tool with --netconf-ned
flag.
It is important to note that deleting the NETCONF NED builder project also deletes the list of modules along with the selection data all of the downloaded YANG modules and a working copy of the NED. Only the exported NED tarball or development NED are kept. The selection data may also be saved in the form of a selection profile as described in Selecting the Modules.
Selecting the modules for inclusion in the NED is a crucial step in the process of building the NED. The recommendation is to select only the modules necessary to perform the tasks within the given NSO installation to reduce memory consumption and size of sync-from and improve upgrade times.
For example, if the aim of the NSO installation is exclusively to manage BGP on the device and necessary configuration is defined in a separate module, then only this module needs to be selected. If there are several services running within the NSO installation, then it might be necessary to include more data models in the single NED for a given kind of device. However, if the NSO installation is used, for example, to take full backups of the devices' configuration, then all modules need to be included in the NED.
Selecting a module will also, by default, select the module's dependencies (namely, modules that are known to deviate this module in the device's implementation and modules imported by the selected modules). To disable this behaviour a no-dependencies
flag may be used, but it should be noted that with dependencies missing the NED will fail to build. Deselecting a module, on the other hand, does not automatically deselect modules depending on it.
Using select
action on the module list entry will set a selection flag on it.
Once the module is selected the download starts automatically in the background.
We also see that the NETCONF NED Builder identified module Cisco-IOS-XR-types as being imported by Cisco-IOS-XR-ifmgr-cfg and it has been also selected.
CLI wildcards may be used to select multiple modules
One may want to reuse a selection of modules, for example, if the same modules should be selected for the new major version of the NED as for the previous one. For this purpose, NETCONF NED builder supports selection profiles.
The selection profile is configuration data. It may be created in two ways:
By exporting current selection from existing project using save-selection
action on the project list entry.
By manually creating a profile in configuration mode. As any other configuration it may be also, for example, be exported to XML file and loaded later.
A profile is applied to a certain project using apply
action on the profile list entry.
It is important to note that while select
action selects modules with dependencies, a profile is assumed to be an exhaustive selection of modules and hence the dependencies are ignored. This is also indicated by no-dependencies
status flag on the selected modules.
Modules that have been selected but are not needed anymore may be deselected using deselect
action. It should be noted that this action only deselects the target module and does not automatically deselect modules that are dependent on it nor its dependencies (so, select
and deselect
actions are asymmetrical in this regard).
Modules that had been downloaded prior to being deselected are not removed from the NETCONF NED Builder cache, but they will not be included into the NED package upon building or making a development package. At the same time, a deselected module is not removed from a package that has already been built.
The NED build is triggered using build-ned
action on the project list entry.
If there was no error reported, then the build has been successful. There might be some warnings issued by the compiler that will be saved in build-warning
leaf under module list entries. If the action returned an error mentioning that it was not possible to compile the NED bundle, then in addition to warnings there will be some errors that are saved in build-error
leaf under module list entries.
Possible ways to resolve this error would be to
deselect the module if it is not critical for NSO operations towards this kind of devices
use make-development-ned
action to create a development version of NED and fix the issue in the YANG module
In addition to the processed compiler output that is saved per module list entry, it may be necessary to see the full compiler output. It is saved in the compiler-output
leaf under the project list entry. The leaf is hidden by a hide-group debug
and may be accessed on the CLI using unhide debug
command if the hide group is configured in ncs.conf
.
The idea is to write a YANG data model and feed that into the NSO CLI engine such that the resulting CLI mimics that of the device to manage. This is fairly straightforward once you have understood how the different constructs in YANG are mapped into CLI commands. The data model usually needs to be annotated with a specific Tail-f CLI extension to tailor exactly how the CLI is rendered.
This chapter will describe how the general principles work and give a number of cookbook-style examples of how certain CLI constructs are modeled.
The CLI NED is primarily designed to be used with devices that has a CLI that is similar to the CLIs on a typical Cisco box (i.e. IOS, XR, NX-OS, etc). However, if the CLI follows the same principles but with a slightly different syntax, it may still be possible to use a CLI NED if some of the differences are handled by the Java part of the CLI NED. This chapter will describe how this can be done.
Let's start with the basic data model for CLI mapping. YANG consists of three major elements: containers, lists, and leaves. For example
The basic rendering of the constructs is as follows. Containers are rendered as command prefixes which can be stacked at any depth. Leaves are rendered as commands that take one parameter. Lists are rendered as submodes, where the key of the list is rendered as a submode parameter. The example above would result in the command
for entering the interface ethernet submode. The interface is a container and is rendered as a prefix, ethernet is a list and is rendered as a submode. Two additional commands would be available in the submode
A typical configuration with two interfaces could look like this
Note that it makes sense to add help texts to the data model since these texts will be visible in the NSO and help the user see the mapping between the J-style CLI in the NSO and the CLI on the target device. The data model above may look like the following with proper help texts.
I will generally not include the help texts in the examples below to save some space but they should be present in a production data model.
The basic rendering suffice in many cases but is also not enough in many situations. What follows is a list of ways to annotate the data model in order to make the CLI engine mimic a device.
Sometimes you want a number of instances (a list) but do not want a submode. For example
The above would result in the commands
A typical show-config output may look like:
Sometimes you want a submode to be created without having a list instance, for example a submode called aaa where all aaa configuration is located.
This is done by using the tailf:cli-add-mode extension. For example:
This would result in the command aaa for entering the container. However, sometimes the CLI requires that a certain set of elements are also set when entering the submode, but without being a list. For example the police rules inside a policy map in the Cisco 7200.
Here the leaves with the annotation tailf:cli-hide-in-submode
are not present as commands once the submode has been entered, but are instead only available as options the police command when entering the police submode.
Often a command is defined as taking multiple parameters in a typical Cisco CLI. This is achieved in the data model by using the annotations tailf:cli-sequence-commands
, tailf:cli-compact-syntax
, tailf:cli-drop-node-name
, and possibly tailf:cli-reset-siblings
.
For example:
This results in the command:
The tailf:cli-sequence-commands
annotation tells the CLI engine to process the leaves in sequence.
The tailf:cli-reset-siblings
tells the CLI to reset all leaves in the container if one is set. This is necessary to ensure that no lingering config remains from a previous invocation of the command where more parameters were configured.
The tailf:cli-drop-node-name
tells the CLI that the leaf name shouldn't be specified.
The tailf:cli-compact-syntax
annotation tells the CLI that the leaves should be formatted on one line, i.e. as:
As opposed to the following without the annotation:
When constructs are used to control if the numerical value should be the milli
or the secs
leaf.
This command could also be written using a choice construct as:
Sometimes the tailf:cli-incomplete-command
is used to ensure that all parameters are configured. The cli-incomplete-command
only applies to the C- and I-style CLI. To ensure that prior leaves in a container are also configured when the configuration is written using J-style or Netconf, proper must
declarations should be used.
Another example is below where tailf:cli-optional-in-sequence
is used:
The tailf:cli-optional-in-sequence
means that the parameters should be processed in sequence but a parameter can be skipped. However, if a parameter is specified then only parameters later in the container can follow it.
It is also possible to have some parameters in sequence initially in the container, and then the rest in any order. This is indicated by the tailf:cli-break-sequence command. For example:
Where it is possible to write:
As well as:
Sometimes a command for entering a submode has parameters that are not really key values, i.e. not part of the instance identifier, but still need to be given when entering the submode. For example
In this case, the tcpudp
is a non-key leaf that needs to be specified as a parameter when entering the service-group
submode. Once in the submode, the commands backup-server-event-log
and extended-stat
s are present. Leaves with the tailf:cli-hide-in-submode
attribute are given after the last key, in the sequence they appear in the list.
It is also possible to allow leaf values to be entered in between key elements. For example:
Here we have a list that is not mapped to a submode. It has two keys, read and remote, and an optional oid that can be specified before the remote key. Finally, after the last key, an optional mask parameter can be specified. The use of the tailf:cli-expose-key-name
means that the key names should be part of the command, which they are not by default. The above construct results in the commands:
The tailf:cli-reset-container
attribute means that all leaves in the container will be reset if any leaf is given.
Some devices require that a setting is removed before it can be changed, for example, the service-group list above. This is indicated with the tailf:cli-remove-before-change
annotation. It can be used both on lists and on leaves. A leaf example:
This means that the diff sent to the device will contain first a no source-ip
command, followed by a new source-ip command to set the new value.
The data model also uses the tailf:cli-no-value-on-delete
annotation which means that the leaf value should not be present in the no command. With the annotation, a diff to modify the source IP from 1.1.1.1 to 2.2.2.2 would look like:
And without the annotation as:
By default, a diff for an ordered-by user
list contains information about where a new item should be inserted. This is typically not supported by the device. Instead, the commands (diff) to send the device needs to remove all items following the new item, and then reinsert the items in the proper order. This behavior is controlled using the tailf:cli-long-obu-diff
annotation. For example:
Suppose we have the access list:
And we want to change this to:
We would generate the diff with the tailf:cli-long-obu-diff
:
Without the annotation, the diff would be:
Often in a config when a leaf is set to its default value it is not displayed by the show running-config
command, but we still need to set it explicitly. Suppose we have the leaf state
. By default, the value is active
.
If the device state is block
and we want to set it to active
, i.e. the default value. The default behavior is to send to the device.
This will not work. The correct command sequence should be:
The way to achieve this is to do the following.
This way a value for 'state' will always be generated. This may seem unintuitive but the reason this works comes from how the diff is calculated. When generating the diff, the target configuration and the desired configuration is compared (per line). The target config will be:
The desired config will be:
This will be interpreted as a leaf value change and the resulting diff will be to set the new value, i.e. active.
However, without the cli-show-with-default
option the desired config will be an empty line, i.e. no value set. When we compare the two lines we get:
(current config)
(desired config)
This will result in the command to remove the configured leaf, i.e., the following, which does not work
What you see in the C-style CLI when you do show configuration
is the commands needed to go from the running config to the configuration you have in your current session. It usually corresponds to the command you have just issued in your CLI session, but not always.
The output is actually generated by comparing the two configurations, i.e. the running config and your current uncommitted configuration. It is done by running show running-config
on both the running config and your uncommitted config and then comparing the output line by line. Each line is complemented by some meta information which makes it possible to generate a better diff.
For example, if you modify a leaf value, say set the mtu
to 1400
and the previous value was 1500
. The two configs will then be:
When we compare these configs, the first line is the same -> no action but we remember that we have entered the FastEthernet0/0/1 submode. The second line differs in value (the meta-information associated with the lines has the path and the value). When we analyze the two lines we determine that a value_set
has occurred. The default action when the value has been changed is to output the command for setting the new value, i.e. mtu
1500
. However, we also need to reposition to the current submode. If this is the first line we are outputting in the submode, we need to issue the command before issuing the mtu 1500
command.
Similarly, suppose a value has been removed, i.e. mtu used to be set but it is no longer present:
As before, the first lines are equivalent, but the second line has !
in the new config, and mtu 1400
in the running config. This is analyzed as being a delete and the commands. The following is generated:
There are tweaks to this behavior. For example, some machines do not like the no command to include the old value but want instead the command
We can instruct the CLI diff engine to behave in this way by using the YANG annotation tailf:cli-no-value-on-delete;
:
It is also possible to tell the CLI engine to not include the element name in the delete operation. For example the command:
But the command to delete the password is:
The data model for this would be:
It is often necessary to do some minor modifications to the Java part of a CLI NED. There are mainly four functions that needs to be modified: connect, show, applyConfig, enter/exit config mode.
The CLI NED code should do a few things when the connect callback is invoked.
Set up a connection to the device (usually ssh).
If necessary send a secondary password to enter exec mode. Typically a Cisco IOS-like CLI requires the user to give the enable command followed by a password.
Verify that it is the right kind of device and respond to NSO with a list of capabilities. This is usually done by running the show version
command, or equivalent, and parsing the output.
Configure the CLI session on the device to not use pagination. This is normally done by setting the screen length to 0 (or infinity or disable). Optionally it may also fiddle with the idle time.
Some modifications may be needed in this section if the commands for the above differ from the Cisco IOS style.
The NSO will invoke the show()
callback multiple times, one time for each top-level tag in the data model. Some devices have support for displaying just parts of the configuration, others do not.
For a device that cannot display only parts of a config the recommended strategy is to wait for a show() invocation with a well known top tag and send the entire config at that point. If you know that the data model has a top tag called interface then you can use code like:
From the point of NSO, it is perfectly ok to send the entire config as a response to one of the requested top tags, and to send an empty response otherwise.
Often some filtering is required of the output from the device. For example, perhaps part of the configuration should not be sent to NSO, or some keywords replaced with others. Here are some examples:
Stripping Sections, Headers, and Footers
Some devices start the output from show running-config
with a short header, and some add a footer. Common headers are Current configuration:
and a footer may be end
or return
. In the example below we strip out a header and remove a footer.
Also, you may choose to only model part of a device configuration in which case you can strip out the parts that you have not modelled. For example, stripping out the SNMP configuration:
Removing Keywords
Sometimes a device generates non-parsable commands in the output from show running-config
. For example, some A10 devices add a keyword cpu-process
at the end of the ip route
command, i.e.
However, it does not accept this keyword when a route is configured. The solution is to simply strip the keyword before sending the config to NSO and to not include the keyword in the data model for the device. The code to do this may look like this:
Replacing Keywords
Sometimes a device has some other names for delete than the standard no command found in a typical Cisco CLI. NSO will only generate no
commands when, for example, an element does not exist (i.e. no shutdown
for an interface), but the device may need undo
instead. This can be dealt with as a simple transformation of the configuration before sending it to NSO. For example:
Another example is the following situation. A device has a configuration for port trunk permit vlan 1-3
and may at the same time have disallowed some VLANs using the command no port trunk permit vlan 4-6
. Since we cannot use a no
container in the config, we instead add a disallow
container, and then rely on the Java code to do some processing, e.g.:
And in the Java show()
code:
A similar transformation needs to take place when the NSO sends a configuration change to the device. A more detailed discussion about applying config modifications follows later but the corresponding code would in this case be:
Different Quoting Practices
If the way a device quotes strings differs from the way it can be modeled in NSO, it can be handled in the Java code. For example, one device does not quote encrypted password strings which may contain odd characters like the command character !
. Java code to deal with this may look like:
And similarly de-quoting when applying a configuration.
NSO will send the configuration to the device in three different callbacks: prepare()
, abort()
, and revert()
. The Java code should issue these commands to the device but some processing of the commands may be necessary. Also, the ongoing CLI session needs to enter configure mode, issue the commands, and then exit configure mode. Some processing may be needed if the device has different keywords, or different quoting, as described under the Displaying the Configuration of a Device section above.
For example, if a device uses undo in place of no then the code may look like this, where data
is the string of commands received from NSO:
This relies on the fact that NSO will not have any indentation in the commands sent to the device (as opposed to the indentation usually present in the output from show running-config
).
The typical Cisco CLI has two major modes, operational mode and configure mode. In addition, the configure mode has submodes. For example, interfaces are configured in a submode that is entered by giving the command interface <InterfaceType> <Number>
. Exiting a submode, i.e. giving the exit command, leaves you in the parent mode. Submodes can also be embedded in other submodes.
In a typical Cisco CLI, you do not necessary have to exit a submode to execute a command in a parent mode. In fact, the output of the command show running-config
hardly contains any exit commands. Instead, there is an exclamation mark, !
, to indicate that a submode is done, which is only a comment. The config is formatted to rely on the fact that if a command isn't found in the current submode, the CLI engine searches for the command in its parent mode.
Another interesting mapping problem is how to interpret the no command when multiple leaves are given on a command line. Consider the model:
It corresponds to the command syntax foo [a <word> [b <word> [c <word>]]]
, i.e. the following commands are valid:
Now what does it mean to write no foo a <word> b <word> c <word>
? . It could mean that only the c
leaf should be removed, or it could mean that all leaves should be removed, and it may also mean that the foo
container should be removed.
There is no clear principle here and no one right solution. The annotations are therefore necessary to help the diff engine figure out what to actually send to the device.
The full set of annotations can be found in the tailf_yang_cli_extensions
man page. All are not applicable in an NSO context, but most are. The most commonly used annotations are (in alphabetical order):
Non transactional devices | Transactional devices | Transactional devices with confirmed commit | Fully capable NETCONF server |
---|---|---|---|
SNMP, Cisco IOS, NETCONF devices with startup+running.
Devices that can abort, NETCONF devices without confirmed commit.
Cisco XR type of devices.
ConfD, Junos.
INITIALIZE: The initialize phase is used to initialize a transaction. For instance, if locking or other transaction preparations are necessary, they should be performed here. This callback is not mandatory to implement if no NED specific transaction preparations are needed.
initialize()
. NED code shall make the device go into config mode (if applicable) and lock (if applicable).
initialize()
. NED code shall start a transaction on the device.
initialize()
. NED code shall do the equivalent of configure exclusive.
Built in, NSO will lock.
UNINITIALIZE: If the transaction is not completed and the NED has done INITIALIZE, this method is called to undo the transaction preparations, that is restoring the NED to the state before INITIALIZE. This callback is not mandatory to implement if no NED specific preparations was performed in INITIALIZE.
uninitialize()
. NED code shall unlock (if applicable).
uninitialize()
. NED code shall abort the transaction.
uninitialize()
. NED code shall abort the transaction.
Built in, NSO will unlock.
PREPARE: In the prepare phase, the NEDs get exposed to all the changes that are destined for each managed device handled by each NED. It is the responsibility of the NED to determine the outcome here. If the NED replies successfully from the prepare phase, NSO assumes the device will be able to go through with the proposed configuration change.
prepare(Data)
. NED code shall send all data to the device.
prepare(Data)
. NED code shall add Data to the transaction and validate.
prepare(Data)
. NED code shall add Data to the transaction and validate.
Built in, NSO will edit-config towards the candidate, validate and commit confirmed with a timeout.
ABORT: If any participants in the transaction reject the proposed changes, all NEDs will be invoked in the abort()
method for each managed device the NED handles. It is the responsibility of the NED to make sure that whatever was done in the PREPARE phase is undone. For NEDs that indicate as reply in newConnection()
that they want the reverse diff, they will get the reverse data as a parameter here.
abort(ReverseData | null)
Either do the equivalent of copy startup to running, or apply the ReverseData to the device.
abort(ReverseData | null)
. Abort the transaction
abort(ReverseData | null)
. Abort the transaction
Built in, discard-changes and close.
COMMIT: Once all NEDs that get invoked in commit(Timeout)
reply ok, the transaction is permanently committed to the system. The NED may still reject the change in COMMIT. If any NED reject the COMMIT, all participants will be invoked in REVERT, NEDs that support confirmed commit with a timeout, Cisco XR, may choose to use the provided timeout to make REVERT easy to implement.
commit(Timeout)
. Do nothing
commit(Timeout)
. Commit the transaction.
commit(Timeout)
. Execute commit confirmed [Timeout] on the device.
Built in, commit confirmed with the timeout.
REVERT: This state is reached if any NED reports failure in the COMMIT phase. Similar to the ABORT state, the reverse diff is supplied to the NED if the NED has asked for that.
revert(ReverseData | null)
Either do the equivalent of copy startup to running, or apply the ReverseData to the device.
revert(ReverseData | null)
Either do the equivalent of copy startup to running, or apply the ReverseData to the device.
revert(ReverseData | null)
. discard-changes
Built in, discard-changes and close.
PERSIST: This state is reached at the end of a successful transaction. Here it's responsibility of the NED to make sure that if the device reboots, the changes are still there.
persist()
Either do the equivalent of copy running to startup or nothing.
persist()
Either do the equivalent of copy running to startup or nothing.
persist()
. confirm.
Built in, commit confirm.