Design large and scalable NSO applications using LSA.
Layered Service Architecture (LSA) is a design approach for massively large and scalable NSO applications. Large service providers and enterprises can use it to manage services for millions of users, ranging over several hundred thousand managed devices. Such scale requires special consideration since a single NSO instance no longer suffices and LSA helps you address this challenge.
At some point, scaling up hits the law of diminishing returns. Effectively, adding more resources to the NSO server becomes prohibitively expensive. To further increase the throughput of the whole system, you can share the load across multiple instances, in a scale-out fashion.
You achieve this by splitting a service into a main, upper-layer part, and one or more lower-layer parts. The upper part controls and dispatches work to the lower parts. This is the same approach as using a customer-facing service (CFS) and a resource-facing service (RFS). However, here the CFS code (the upper-layer part) runs in a different NSO node than the RFS code (the lower-layer parts). What is more, the lower-layer parts can be spread across multiple NSO nodes.
Each RFS node is responsible for its own set of managed devices, mounted under its /devices
tree, and the upper-layer, CFS node only concerns itself with the RFS nodes. So, the CFS node only mounts the RFS nodes under its /devices
tree, not managed devices directly. The main advantage of this architecture is that you can add many device RFS nodes that collectively manage a huge number of actual devices—much more than a single node could.
While it is tempting to design the system in the most scalable way from the start, it comes with a cost. Compared to a single, non-LSA setup, the automation system now becomes distributed across multiple nodes, with all the complexity that entails. For example, in a non-distributed system, the communication between different parts has mostly negligible latency and hardly ever fails. That is certainly not true anymore for distributed systems as we know them today, including LSA.
More practically, taking a service in NSO and deploying a single instance on an LSA system is likely to take longer and have a higher chance of failure compared to a non-LSA system, because additional network communication is involved.
Moreover, multiple NSO nodes present a higher operational complexity and administrative burden. There is no longer a “single pane of glass” view of all the individual devices. That's why you must weigh the benefits of the LSA approach against the scale at which you operate. When LSA starts making sense will depend on the type of devices you manage, the services you have, the geographical distribution of resources, and so on.
A distributed system can push the overall throughput way beyond what a single instance can do. But you will achieve a much better outcome by first focusing on eliminating the bottlenecks in the provisioning code, as discussed in Scaling and Performance Optimization. Only when that proves insufficient, consider deploying LSA.
LSA also addresses the memory limitations of NSO when device configurations become very large (individually or all together). If the NSO server is memory-constrained and more memory cannot be added, the LSA approach can be a solution.
Another challenge that LSA may help you overcome is scaling organizationally. When many teams share the same NSO instance, it can get hard to separate the different concerns and responsibilities. Teams may also have different cadences or preferences for upgrades, resulting in friction. With LSA, it becomes possible to create a clearer separation. The CFS node and the RFS nodes can have different release cycles (as long as the YANG upgrade rules are followed) and each can be upgraded independently. If a bug is found or a feature is missing in the RFS nodes, it can be fixed without affecting the CFS node, and vice versa.
To summarize, the major advantage of this architecture is scalability. The solution scales horizontally, both at the upper and the lower layer, thus catering for truly massive deployments, but at the expense of the increased complexity.
To take advantage of the scalability potential of LSA, your services must be designed in a layered fashion. Once the automation logic in NSO reaches a certain level of complexity, a stacked service design tends to emerge naturally. Often, you can extend it to LSA with relatively little change. The same is true for brand-new, green field designs.
In other situations, you might need to invest some additional effort to split and orchestrate the work across multiple groups of devices. Examples are existing monolithic services or stacked service designs that require all RFSs to access all devices.
If you are designing the service from scratch, you have the most freedom in choosing the partitioning of logic between CFS and RFS. The CFS must contain the YANG definition for the service and its configurable options that are available to the customer, perhaps through an order capture system north of the NSO. On the other hand, the RFS YANG models are internal to the service, that is, they are not used directly by the customer. So, you are free to design them in a way that makes the provisioning code as simple as possible.
As an example, you might have a VLAN provisioning service where the CFS lets users select if the hosts on the VLAN can access the internet. Then you can divide provisioning into, let's say, an RFS service that configures the VLAN and the appropriate IP subnet across the data center switches, and another RFS service that configures the firewall to allow the traffic from the subnet to reach the internet. This design clearly separates the provisioned devices into two groups: firewalls and data center switches. Each group can be managed by a separate lower-layer NSO.
Similar to a brand new design, an existing monolithic application that uses stacked services has already laid the groundwork for LSA-compatible design because of the existing division into two layers (upper and lower).
A possible complication, in this case, is when each existing RFS touches all of the affected devices, and that makes it hard to partition devices across multiple lower-layer NSO nodes. For example, if one RFS manages the VLAN interface (the VLAN ID and layer 2 settings) and another RFS manages the IP configuration for this interface, that configuration very likely happens on the same devices. The solution in this situation could be to partition RFS services based on the data center that they operate in, such as one lower-layer NSO node for one data center, another lower-layer NSO for another data center, and so on. If that is not possible, an alternative is to redesign each RFS and split their responsibilities differently.
The most complex, yet common case is when a single node NSO installation grows over time and you are faced with performance problems due to the new size. To leverage the LSA functionality, you must first split the service into upper- and lower-layer parts, which require a certain amount of effort. That is why the decision to use LSA should always be accompanied by a thorough analysis to determine what makes the system too slow. Sometimes, it is a result of a bad "must" expression in the service YANG code or similar. Fixing that is much easier than re-architecting the application.
Regardless of whether you start with a green field design or extend an existing application, you must tackle the problem of dispatching the RFS instantiation to the correct lower-layer NSO node.
Imagine a VPN application that uses a managed device on each site to securely connect to the private network. In a service provider network, this is usually done by the CPE. When a customer orders connectivity to an additional site (another leg of the VPN), the service needs to configure the site-local device (the CPE). As there will be potentially many such devices, each will be managed by one of the RFS nodes. However, the VPN service is managed centrally, through the CFS, which must:
Figure out which RFS node is responsible for the device for the new site (CPE).
Dispatch the RFS instantiation to that particular RFS node, making sure the device is properly configured.
NSO provides a mechanism to facilitate the second part, the actual dispatch, but the service logic must somehow select the correct RFS node. If the RFS nodes are geographically separated across different countries or different data centers, the CFS could simply infer or calculate the right RFS node based on service instance parameters, such as the physical location of the new site.
A more flexible alternative is to use dynamic mapping. It can be as simple as a list of 2-tuples that map a device name to an RFS node, stored in the CDB. The trade-off is that the list must be maintained. It is straightforward to automate the maintenance of the list though, for example through NETCONF notifications whenever /devices/device
on the RFS nodes is manipulated or by explicitly asking the CFS node to query the RFS nodes for their list of devices.
Ultimately, the right approach to dispatch will depend on the complexity of your service and operational procedures.
Having designed a layered service with the CFS and RFS parts, the CFS must now communicate with the RFS that resides on a different node. You achieve that by adding the lower-layer (RFS) node as a managed device to the upper-layer (CFS) node. The CFS node must access the RFS data model on the lower-layer node, just like it accesses any other configuration on any managed device. But don't you need a NED to do this? Indeed, you do. That's why the RFS model needs to be specially compiled for the upper-layer node to use as part of NED and not a standalone service. A model compiled in this way is called a 'device compiled'.
Let's then see how the LSA setup affects the whole service provisioning process. Suppose a new request arrives at the CFS node, such as a new service instance being created through RESTCONF by a customer order portal. The CFS runs the service mapping logic as usual; however, instead of configuring the network devices directly, the CFS configures the appropriate RFS nodes with the generated RFS service instance data. This is the dispatch logic in action.
As the configuration for the lower-layer nodes happens under the /devices/device
tree, it is picked up and pushed to the relevant NSO instances by the NED. The NED sends the appropriate NETCONF edit-config RPCs, which trigger the RFS FASTMAP code at the RFS nodes. The RFS mapping logic constructs the necessary network configuration for each RFS instance and the RFS nodes update the actual network devices.
In case the commit queue feature is not being used, this entire sequence is serialized through the system as a whole. It means that if another northbound request arrives at the CFS node while the first request is being processed, the second request is synchronously queued at the CFS node, waiting for the currently running transaction to either succeed or fail.
If the code on the RFS nodes is reactive, it will likely return without much waiting, since the RFM applications are usually very fast during their first round of execution. But that will still have a lower performance than using the commit queue since the execution is serialized eventually when modifying devices. To maximize throughput, you also need to enable the commit queue functionality throughout the system.
The main benefit of LSA is that it scales horizontally at the RFS node layer. If one RFS node starts to become overloaded, it's easy to bring up an additional one, to share the load. Thus LSA caters to scalability at the level of the number of managed devices. However, each RFS node needs to host all the RFSs that touch the devices it manages under its /devices/device
tree. There is still one, and only one, NSO node that directly manages a single device.
Dividing a provisioning application into upper and lower-layer services also increases the complexity of the application itself. For example, to follow the execution of a reactive or nano RFS, typically an additional NETCONF notification code must be written. The notifications have to be sent from the RFS nodes and received and processed by the CFS code. This way, if something goes wrong at the device layer, the information is relayed all the way to the top level of the system.
Furthermore, it is highly recommended that LSA applications enable the commit queue on all NSO nodes. If the commit queue is not enabled, the slowest device on the network will limit the overall throughput, significantly reducing the benefits of LSA.
Finally, if the two-layer approach proves to be insufficient due to requirements at the CFS node, you can extend it to three layers, with an additional layer of NSO nodes between the CFS and RFS layers.
This section describes a small LSA application, which exists as a running example in the examples.ncs/layered-services-architecture/lsa-single-version-deployment directory.
The application is a slight variation on the examples.ncs/service-management/rfs-service example where the YANG code has been split up into an upper-layer and a lower-layer implementation. The example topology (based on netsim for the managed devices, and NSO for the upper/lower layer NSO instances) looks like the following:
The upper layer of the YANG service data for this example looks like the following:
Instantiating one CFS we have:
The provisioning code for this CFS has to make a decision on where to instantiate what. In this example the "what" is trivial, it's the accompanying RFS, whereas the "where" is more involved. The two underlying RFS nodes, each manage 3 netsim routers, thus given the input, the CFS code must be able to determine which RFS node to choose. In this example, we have chosen to have an explicit map, thus on the upper-nso
we also have:
So, we have a template CFS code that does the dispatching to the right RFS node.
This technique for dispatching is simple and easy to understand. The dispatching might be more complex, it might even be determined at execution time dependent on CPU load. It might be (as in this example) inferred from input parameters or it might be computed.
The result of the template-based service is to instantiate the RFS, at the RFS nodes.
First, let's have a look at what happened in the upper-nso. Look at the modifications but ignore the fact that this is an LSA service:
Just the dispatched data is shown. As ex0
and ex5
reside on different nodes, the service instance data has to be sent to both lower-nso-1
and lower-nso-2
.
Now let's see what happened in the lower-nso
. Look at the modifications and take into account that these are LSA nodes (this is the default):
Both the dispatched data and the modification of the remote service are shown. As ex0
and ex5
reside on different nodes, the service modifications of the service rfs-vlan
on both lower-nso-1
and lower-nso-2
are shown.
The communication between the NSO nodes is of course NETCONF.
The YANG model at the lower layer, also known as the RFS layer, is similar to the CFS, but slightly different:
The task for the RFS provisioning code here is to actually provision the designated router. If we log into one of the lower layer NSO nodes, we can check the following.
To conclude this section, the final remark here is that to design a good LSA application, the trick is to identify a good layering for the service data models. The upper layer, the CFS layer is what is exposed northbound, and thus requires a model that is as forward-looking as possible since that model is what a system north of NSO integrates to, whereas the lower layer models, the RFS models can be viewed as "internal system models" and they can be more easily changed.
In this section, we'll describe a lightly modified version of the example in the previous section. The application we describe here exists as a running example under examples.ncs/layered-services-architecture/lsa-scaling.
Sometimes it is desirable to be able to easily move devices from one lower LSA node to another. This makes it possible to easily expand or shrink the number of lower LSA nodes. Additionally, it is sometimes desirable to avoid HA pairs for replication but instead use a common store for all lower LSA devices, such as a distributed database, or a common file system.
The above is possible provided that the LSA application is structured in certain ways.
The lower LSA nodes only expose services that manipulate the configuration of a single device. We call these devices RFSs, or dRFS for short.
All services are located in a way that makes it easy to extract them, for example in /drfs:dRFS/device
No RFS takes place on the lower LSA nodes. This avoids the complication with locking and distributed event handling.
The LSA nodes need to be set up with the proper NEDs and with auth groups such that a device can be moved without having to install new NEDs or update auth groups.
Provided that the above requirements are met, it is possible to move a device from one lower LSA node by extracting the configuration from the source node and installing it on the target node. This, of course, requires that the source node is still alive, which is normally the case when HA-pairs are used.
An alternative to using HA-pairs for the lower LSA nodes is to extract the device configuration after each modification to the device and store it in some central storage. This would not be recommended when high throughput is required but may make sense in certain cases.
In the example application, there are two packages on the lower LSA nodes that provide this functionality. The package inventory-updater
installs a database subscriber that is invoked every time any device configuration is modified, both in the preparation phase and in the commit phase of any such transaction. It extracts the device and dRFS configuration, including service metadata, during the preparation phase. If the transaction proceeds to a full commit, the package is again invoked and the extracted configuration is stored in a file in the directory db_store
.
The other package is called device-actions
. It provides three actions: extract-device
, install-device
, and delete-device
. They are intended to be used by the upper LSA node when moving a device either from a lower LSA node or from db_store
.
In the upper LSA node, there is one package for coordinating the movement, called move-device
. It provides an action for moving a device from one lower LSA node to another. For example when invoked to move device ex0
from lower-1
to lower-2
using the action
it goes through the following steps:
A partial lock is acquired on the upper-nso for the path /devices/device[name=lower-1]/config/dRFS/device[name=ex0]
to avoid any changes to the device while the device is in the process of being moved.
The device and dRFS configuration are extracted in one of two ways:
Read the configuration from lower-1
using the action
Read the configuration from some central store, in our case the file system in the directory. db_store
.
The configuration will look something like this
Install the configuration on the lower-2
node. This can be done by running the action:
This will load the configuration and commit using the flags no-deploy
and no-networking
.
Delete the device from lower-1
by running the action
Update mapping table
Release the partial lock for /devices/device[name=lower-1]/config/dRFS/device[name=ex0]
.
Re-deploy all services that have touched the device. The services all have backpointers from /devices/device{lower-1}/config/dRFS/device{ex0}
. They are re-deployed
using the flags no-lsa
and no-networking
.
Finally, the action runs compare-config
on lower-1
and lower-2
.
With this infrastructure in place, it is fairly straightforward to implement actions for re-balancing devices among lower LSA nodes, as well as evacuating all devices from a given lower LSA node. The example contains implementations of those actions as well.
If we do not have the luxury of designing our NSO service application from scratch, but rather are faced with extending/changing an existing, already deployed application into the LSA architecture we can use the techniques described in this section.
Usually, the reasons for rearchitecting an existing application are performance-related.
In the NSO example collection, two popular examples are the examples.ncs/service-management/mpls-vpn-java and examples.ncs/service-management/mpls-vpn-python examples. Those example contains an almost "real" VPN provisioning example whereby VPNs are provisioned in a network of CPEs, PEs, and P routers according to this picture:
The service model in this example roughly looks like this:
There are several interesting observations on this model code related to the Layered Service Architecture.
Each instantiated service has a list of endpoints and CPE routers. These are modeled as a leafref into the /devices tree. This has to be changed if we wish to change this application into an LSA application since the /devices tree at the upper layer doesn't contain the actual managed routers. Instead, the /devices tree contains the lower layer RFS nodes.
There is no connectivity/topology information in the service model. Instead, the mpls-vpn
example has topology information on the side, and that data is used by the provisioning code. That topology information for example contains data on which CE routers are directly connected to which PE router.
Remember from the previous section, that one of the additional complications of an LSA application is the dispatching part. The dispatching problem fits well into the pattern where we have topology information stored on the side and let the provisioning FASTMAP code use that data to guide the provisioning. One straightforward way would be to augment the topology information with additional data, indicating which RFS node is used to manage a specific managed device.
By far the easiest way to change an existing monolithic NSO application into the LSA architecture is to keep the service model at the upper layer and lower layer almost identical, only changing things like leafrefs directly into the /devices tree which obviously breaks.
In this example, the topology information is stored in a separate container share-data
and propagated to the LSA nodes by means of service code.
The example examples.ncs/layered-services-architecture/mpls-vpn-lsa example does exactly this, the upper layer data model in upper-nso/packages/l3vpn/src/yang/l3vpn.yang
now looks as:
The ce-device
leaf is now just a regular string, not a leafref.
So, instead of an NSO topology that looks like:
We want an NSO architecture that looks like this:
The task for the upper layer FastMap code is then to instantiate a copy of itself on the right lower layer NSO nodes. The upper layer FastMap code must:
Determine which routers, (CE, PE, or P) will be touched by its execution.
Look in its dispatch table, which lower-layer NSO nodes are used to host these routers.
Instantiate a copy of itself on those lower layer NSO nodes. One extremely efficient way to do that is to use the Maapi.copyTree()
method. The code in the example contains code that looks like this:
Finally, we must make a minor modification to the lower layer (RFS) provisioning code too. Originally, the FastMap code wrote all config for all routers participating in the VPN, now with the LSA partitioning, each lower layer NSO node is only responsible for the portion of the VPN that involves devices that reside in its /devices tree, thus the provisioning code must be changed to ignore devices that do not reside in the /devices tree.
In addition to conceptual changes of splitting into upper- and lower-layer parts, migrating an existing monolithic application to LSA may also impact the models used. In the new design, the upper-layer node contains the (more or less original) CFS model as well as the device-compiled RFS model, which it requires for communication with the RFS nodes. In a typical scenario, these are two separate models. So, for example, they must each use a unique namespace.
To illustrate the different YANG files and namespaces used, the following text describes the process of splitting up an example monolithic service. Let's assume that the original service resides in a file, myserv.yang
, and looks like the following:
In an LSA setting, we want to keep this module as close to the original as possible. We clearly want to keep the namespace, the prefix, and the structure of the YANG identical to the original. This is to not disturb any provisioning systems north of the original NSO. Thus with only minor modifications, we want to run this module at the CFS node, but with non-applicable leafrefs removed, thus at the CFS node we would get:
Now, we want to run almost the same YANG module at the RFS node, however, the namespace must be changed. For the sake of the CFS node, we're going to NED compile the RFS and NSO doesn't like the same namespace to occur twice, thus for the RFS node, we would get a YANG module myserv-rfs.yang
that looks like the following:
This file can, and should, keep the leafref as is.
The final and last file we get is the compiled NED, which should be loaded in the CFS node. The NED is directly compiled from the RFS model, as an LSA NED.
Thus, we end up with three distinct packages from the original one:
The original, slated for the CFS node, with leafrefs removed.
The modified original, slated for the RFS node, with the namespace and the prefix changed.
The NED, compiled from the RFS node code, slated for the CFS node.
The purpose of the upper CFS node is to manage all CFS services and to push the resulting service mappings to the RFS services. The lower RFS nodes are configured as devices in the device tree of the upper CFS node and the RFS services are created under the /devices/device/config
accordingly. This is almost identical to the relation between a normal NSO node and the normal devices. However, there are differences when it comes to commit parameters and the commit queue, as well as some other LSA-specific features.
Such a design allows you to decide whether you will run the same version of NSO on all nodes or not. Since some differences arise between the two options, this document distinguishes a single-version deployment from a multi-version one.
Deployment of an LSA cluster where all the nodes have the same major version of NSO running is called a single version deployment. If the versions are different, then it is a multi-version deployment, since the packages on the CFS node must be managed differently.
The choice between the two deployment options depends on your functional needs. The single version is easier to maintain and is a good starting point but is less flexible. While it is possible to migrate from one to the other, the migration from a single version to a multi-version is typically easier than the other way around. Still, every migration requires some effort, so it is best to pick one approach and stick to it.
You can find working examples of both deployment types in the examples.ncs/layered-services-architecture/lsa-single-version-deployment and examples.ncs/layered-services-architecture/lsa-multi-version-deployment folders, respectively.
The type of deployment does not affect the RFS nodes. In general, the RFS nodes act very much like ordinary standalone NSO instances but only support the RFS services.
Configure and set up the lower RFS nodes as you would a standalone node, by making sure the necessary NED and RFS packages are loaded and the managed network devices added. This requires you to have already decided on the distribution of devices to lower RFS nodes. The RFS packages are ordinary service packages.
The only LSA-specific requirement is that these nodes enable NETCONF communication northbound, as this is how the upper CFS node will interact with them. To enable NETCONF northbound, ensure that a configuration similar to the following is present in the ncs.conf
of every RFS node:
One thing to note is that you do not need to explicitly enable the commit queue on the RFS nodes, even if you intend to use LSA with the commit queue feature. The upper CFS node is aware of the LSA setup and will propagate the relevant commit flags to the lower RFS nodes automatically.
If you wish to enable the commit queue by default, that is, even for transactions originating on the RFS node (non-LSA), you are strongly encouraged to enable it globally, through the /devices/global-settings/commit-queue/enabled-by-default
setting on all the RFS nodes and, importantly, the upper CFS node too. Otherwise, you may end up in a situation where only a part of the transaction runs through the commit queue. In that case, the rollback-on-error
commit queue error option will not work correctly, as it can't roll back the full original transaction but just the part that went through the commit queue. This can result in an inconsistent network state.
Regardless of single or multi-version deployment, the upper CFS node has the lower RFS nodes configured as devices under the /devices/device
tree. The CFS node communicates with these devices through NETCONF and must have the correct ned-id
configured for each lower RFS node. The ned-id
is set under /devices/device/device-type/netconf/ned-id
, as for any NETCONF device.
The part that is specific to LSA is the actual ned-id
used. This has to be ned:lsa-netconf
or a ned-id
derived from it. What is more, the ned-id
depends on the deployment type. For a single-version deployment, you can use the lsa-netconf
value directly. This ned-id
is built-in (defined in tailf-ncs-ned.yang
) and available in NSO without any additional packages.
So the configuration for the RFS device in the CFS node would look similar to:
Notice the use of the lsa-remote-node
instead of the address
(and port
) as is usually done. This setting identifies the device as a lower-layer LSA node and instructs NSO to use connection information provided under cluster
configuration.
The value of lsa-remote-node
references a cluster remote-node
, such as the following:
In addition to devices device
, the authgroup
value is again required here and refers to cluster authgroup
, not the device one. Both authgroups must be configured correctly for LSA to function.
Having added device and cluster configuration for all RFS nodes, you should update the SSH host keys for both, the /devices/device
and /cluster/remote-node
paths. For example:
Moreover, the RFS NSO nodes have an extra configuration that may not be visible to the CFS node, resulting in out-of-sync behavior. You are strongly encouraged to set the out-of-sync-commit-behaviour
value to accept
, with a command such as:
At the same time you should also enable the /cluster/device-notifications
, which will allow the CFS node to receive the forwarded device notifications from the RFS nodes, and /cluster/commit-queue
, to enable the commit queue support for LSA. Without the latter, you will not be able to use the commit commit-queue async
command, for example.
If you wish to enable the commit queue by default, you should do so by setting the /devices/global-settings/commit-queue/enabled-by-default
on the CFS node. Do not use per device or per device group configuration, for the same reason you should avoid it on the RFS nodes.
If you plan a single-version deployment, the preceding steps are sufficient. For a multi-version deployment, on the other hand, there are two additional tasks to perform.
First, you will need to install the correct Cisco-NSO LSA NED package (or packages if you need to support more versions). Each NSO release includes these packages that are specifically tailored for LSA. They are used by the upper CFS node if the lower RFS nodes are running a different version than the CFS node itself. The packages are named cisco-nso-nc-X.Y
where X.Y are the two most significant numbers of the NSO release (the major version) that the package supports. So, if your RFS nodes are running NSO 5.7.2, for example, you should use cisco-nso-nc-5.7
.
These packages are found in the $NCS_DIR/packages/lsa
directory. Each package contains the complete model of the ncs
namespace for the corresponding NSO version, compiled as an LSA NED. Please always use the cisco-nso
package included with the NSO version of the upper CFS node and not some older variant (such as the one from the lower RFS node) as it may not work correctly.
Second, installing the cisco-nso LSA NED package will make the corresponding ned-id
available, such as cisco-nso-nc-5.7
(ned-id
matches the package name). Use this ned-id
for the RFS nodes instead of lsa-netconf
. For example:
This configuration allows the CFS node to communicate with a different NSO version but there are still some limitations. The upper CFS node must have the same or newer version than the managed RFS nodes. For all the currently supported versions of the lower node, the packages can be found in the $NCS_DIR/packages/lsa
directory, but you may also be able to build an older one yourself.
In case you already have a single-version deployment using the lsa-netconf
ned-id'
s, you can use the NED migrate procedure to switch to the new ned-id
and multi-version deployment.
Besides adding managed lower-layer nodes, the upper-layer node also requires packages for the services. Obviously, you must add the CFS package, which is an ordinary service package, to the CFS node. But you must also provide the device compiled RFS YANG models to allow provisioning of RFSs on the remote RFS nodes.
The process resembles the way you create and compile device YANG models in normal NED packages. The ncs-make-package
tool provides the --lsa-netconf-ned
option, where you specify the location of the RFS YANG model and the tool creates a NED package for you. This is a new package that is separate from the RFS package used in the RFS nodes, so you might want to name it differently to avoid confusion. The following text uses the -ned
suffix.
Usually, you would also provide the --no-netsim
, --no-java
, and --no-python
switches to the invocation, as the package is used with the NETCONF protocol and doesn't need any additional code. The --no-netsim
option is required because netsim is not supported for these types of packages. For example:
In this case, there is no explicit --lsa-lower-nso
option specified and ncs-make-package
will by default be set up to compile the package for the single version deployment, tied to the lsa-netconf
ned-id
. That means the models in the NED can be used with devices that have a lsa-netconf
ned-id
configured.
To compile it for the multi-version deployment, which uses a different ned-id
, you must select the target NSO version with the --lsa-lower-nso cisco-nso-nc-X.Y
option, for example:
Depending on the RFS model, the package may fail to compile, even though the model compiles fine as a service. A typical error would indicate some node from a module, such as tailf-ncs
, is not found. The reason is that the original RFS service YANG model has dependencies on other YANG models that are not included in the compilation process.
One solution to this problem is to remove the dependencies in the YANG model before compilation. Normally this can be solved by changing the datatype in the NED compiled copy of the YANG model, for example from leafref
or instance-identifier
to string. This is only needed for the NED compiled copy, the lower RFS node YANG model can remain the same. There will then be an implicit conversion between types, at runtime, in the communication between the upper CFS node and the lower RFS node.
An alternate solution, if you are doing a single version deployment and there are dependencies on the tailf-ncs
namespace, is to switch to a multi-version deployment because the cisco-nso
package includes this namespace (device compiled). Here, the NSO versions match but you are still using the cisco-nso-nc-X.Y
ned-id
and have to follow the instructions for the multi-version deployment.
Once you have both, the CFS and device-compiled RFS service packages are ready; add them to the CFS node, then invoke a sync-from
action to complete the setup process.
You can see all the required setup steps for a single version deployment performed in the example examples.ncs/layered-services-architecture/lsa-single-version-deployment and the examples.ncs/layered-services-architecture/lsa-multi-version-deployment has the steps for the multi-version one. The two are quite similar but the multi-version deployment has additional steps, so it is the one described here.
First, build the example for manual setup.
Then configure the nodes in the cluster. This is needed so that the upper CFS node can receive notifications from the lower RFS node and prepare the upper CFS node to be used with the commit queue.
To be able to handle the lower NSO node as an LSA node, the correct version of the cisco-nso-nc
package needs to be installed. In this example, 5.4 is used.
Create a link to the cisco-nso
package in the packages directory of the upper CFS node:
Reload the packages:
Now when the cisco-nso-nc
package is in place, configure the two lower NSO nodes and sync-from
them:
Now, for example, the configured devices of the lower nodes can be viewed:
Or, alarms inspected:
Now, create a netconf package on the upper CFS node which can be used towards the rfs-vlan
service on the lower RFS node, in the shell terminal window, do the following:
The created NED is an lsa-netconf-ned
based on the YANG files of the rfs-vlan
service:
The version of the NED reflects the version of the nso on the lower node:
The package will be generated in the packages directory of the upper NSO CFS node:
And, the name of the package will be:
Install the cfs-vlan
service on the upper CFS node. In the shell terminal window, do the following:
Reload the packages once more to get the cfs-vlan
package. In the CLI terminal window, do the following:
Now, when all packages are in place a cfs-vlan
service can be configured. The cfs-vlan
service will dispatch service data to the right lower RFS node depending on the device names used in the service.
In the CLI terminal window, verify the service:
As ex0
resides on lower-nso-1
that part of the configuration goes there and the ex5
part goes to lower-nso-2
.
Since an LSA deployment consists of multiple NSO nodes (or HA pairs of nodes), each can be upgraded to a newer NSO version separately. While that offers a lot of flexibility, it also makes upgrades more complex in many cases. For example, performing a major version upgrade on the upper CFS node only will make the deployment Multi-Version even if it was Single-Version before the upgrade, requiring additional action on your part.
In general, staying with the Single-Version Deployment is the simplest option and does not require any further LSA-specific upgrade action (except perhaps recompiling the packages). However, the main downside is that, at least for a major upgrade, you must upgrade all the nodes at the same time (otherwise, you no longer have a Single-Version Deployment).
If that is not feasible, the solution is to run a Multi-Version Deployment. Along with all of the requirements, the section Multi-Version Deployment describes a major difference from the Single Version variant: the upper CFS node uses a version-specific cisco-nso-nc-X.Y
NED ID to refer to lower RFS nodes. That means, if you switch to a Multi-Version Deployment, or perform a major upgrade of the lower-layer RFS node, the ned-id
should change accordingly. However, do not change it directly but follow the correct NED upgrade procedure described in the section called NED Migration. Briefly, the procedure consists of these steps:
Keep the currently configured ned-id for an RFS device and the corresponding packages. If upgrading the CFS node, you will need to recompile the packages for the new NSO version.
Compile and load the packages that are device-compiled with the new ned-id
, alongside the old packages.
Use the migrate
action on a device to switch over to the new ned-id
.
The procedure requires you to have two versions of the device-compiled RFS service packages loaded in the upper CFS node when calling the migrate
action: one version compiled by referencing the old (current) NED ID and the other one by referencing the new (target) NED ID.
To illustrate, suppose you currently have an upper-layer and a lower-layer node both running NSO 5.4. The nodes were set up as described in the Single Version Deployment option, with the upper CFS node using the tailf-ncs-ned:lsa-netconf
NED ID for the lower-layer RFS node. The CFS node also uses the rfs-vlan-ned
NED package for the rfs-vlan
service.
Now you wish to upgrade the CFS node to NSO 5.7 but keep the RFS node on the existing version 5.4. Before upgrading the CFS node, you create a backup and recompile the rfs-vlan-ned
package for NSO 5.7. Note that the package references the lsa-netconf
ned-id
, which is the ned-id
configured for the RFS device in the CFS node's CDB. Then, you perform the CFS node upgrade as usual.
At this point the CFS node is running the new, 5.7 version and the RFS node is running 5.4. Since you now have a Multi Version Deployment, you should migrate to the correct ned-id
as well. Therefore, you prepare the rfs-vlan-nc-5.4
package, as described in the Multi-Version Deployment option, compile the package, and load it into the CFS node. Thanks to the NSO CDM feature, both packages, rfs-vlan-nc-5.4
and rfs-vlan-ned
, can be used at the same time.
With the packages ready, you execute the devices device lower-nso-1 migrate new-ned-id cisco-nso-nc-5.4
command on the CFS node. The command configures the RFS device entry on CFS to use the new cisco-nso-nc-5.4 ned-id
, as well as migrates the device configuration and service meta-data to the new model. Having completed the upgrade, you can now remove the rfs-vlan-ned
if you wish.
Later on, you may decide to upgrade the RFS node to NSO 5.6. Again, you prepare the new rfs-vlan-nc-5.6
package for the CFS node in a similar way as before, now using the cisco-nso-nc-5.6
ned-id instead of cisco-nso-nc-5.4
. Next, you perform the RFS node upgrade to 5.6 and finally migrate the RFS device on the CFS node to the cisco-nso-nc-5.6 ned-id
, with the migrate action.
Likewise, you can return to the Single Version Deployment, by upgrading the RFS node to the NSO 5.7, reusing the old, or preparing anew, the rfs-vlan-ned
package and migrating to the lsa-netconf ned-id
.
All these ned-id
changes stem from the fact that the upper-layer CFS node treats the lower-layer RFS node as a managed device, requiring the correct model, just like it does for any other device type. For the same reason, maintenance (bug fix or patch) NSO upgrades do not result in a changed ned-id
, so for those no migration is necessary.
In LSA, northbound users are authenticated on the CFS, and the request is re-authenticated on the RFS using either a system user or user/pass passthrough.
For token-based authentication using external auth/package auth, this becomes a problem as the user and password are not expected to be locally provisioned and hence cannot be used for authentication towards the RFS, which leaves the option of a system user.
Using a system user has two major limitations:
Auditing on the RFS becomes hard, as system sessions are not logged in the audit.log
.
Device-level RBAC becomes challenging as the devices reside in the RFS and the user information is lost.
To handle this scenario, one can enable the passthrough of the user name and its groups to lower layer nodes to allow the session on the RFS to assume the same user as used on the CFS (similar to use of "sudo"). This will allow for the use of a system user between the CFS and RFS while allowing for auditing and RBAC on the RFS using the locally authenticated user on the CFS.
On the CFS node, create an authgroup under /devices/authgroups/group
with the /devices/authgroups/group/{umap,default-map}/passthrough
empty leaf set, then select this authgroup on the configured RFS nodes by setting the /devices/device/authgroup
leaf. When the passthrough leaf is set and a user (e.g., alice) on the CFS node connects to an RFS node, she will authenticate using the credentials specified in the /devices/device/authgroup
authgroup (e.g., lsa_passthrough_user
: ahVaesai8Ahn0AiW
). Once the authentication completes successfully, the user lsa_passthrough_user
changes into alice on the RFS node.
On the RFS node, configure the mapping of permitted users in the /cluster/global-settings/passthrough/permit
list. The key of the permit list specifies what user may change into a different user. The different possible users to change into are specified by the as-user
leaf-list, and the as-group
leaf-list specifies valid groups. The user will end up with the intersection of groups in the user session on the CFS and the groups specified by the as-group
leaf-list. Only users in the permit list will be allowed to change into the users set in the permit list elements as-user
list.