Services Deep Dive
Deep dive into service implementation.
Before you Proceed
This section discusses the implementation details of services in NSO. The reader should already be familiar with the concepts described in the introductory sections and Implementing Services.
For an introduction to services, see Develop a Simple Service instead.
Common Service Model
Each service type in NSO extends a part of the data model (a list or a container) with the ncs:servicepoint
statement and the ncs:service-data
grouping. This is what defines an NSO service.
The service point instructs NSO to involve the service machinery (Service Manager) for management of that part of the data tree and the ncs:service-data
grouping contains definitions common to all services in NSO. Defined in tailf-ncs-services.yang
, ncs:service-data
includes parts that are required for the proper operation of FASTMAP and the Service Manager. Every service must therefore use this grouping as part of its data model.
In addition, ncs:service-data
provides a common service interface to the users, consisting of:
While not part of ncs:service-data
as such, you may consider the service-commit-queue-event
notification part of the core service interface. The notification provides information about the state of the service when the service uses the commit queue. As an example, an event-driven application uses this notification to find out when a service instance has been deployed to the devices. See the showcase_rc.py
script in examples.ncs/scaling-performance/perf-stack for sample Python code, leveraging the notification. See tailf-ncs-services.yang
for the full definition of the notification.
NSO Service Manager is responsible for providing the functionality of the common service interface, requiring no additional user code. This interface is the same for classic and nano services, whereas nano services further extend the model.
Services and Transactions
NSO calls into Service Manager when accessing actions and operational data under the common service interface, or when the service instance configuration data (the data under the service point) changes. NSO being a transactional system, configuration data changes happen in a transaction.
When applied, a transaction goes through multiple stages, as shown by the progress trace (e.g. using commit | details
in the CLI). The detailed output breaks up the transaction into four distinct phases:
validation
write-start
prepare
commit
These phases deal with how the network-wide transactions work:
The validation phase prepares and validates the new configuration (including NSO copy of device configurations), then the CDB processes the changes and prepares them for local storage in the write-start phase.
The prepare stage sends out the changes to the network through the Device Manager and the HA system. The changes are staged (e.g. in the candidate data store) and validated if the device supports it, otherwise, the changes are activated immediately.
If all systems took the new configuration successfully, enter the commit phase, marking the new NSO configuration as active and activating or committing the staged configuration on remote devices. Otherwise, enter the abort phase, discarding changes, and ask NEDs to revert activated changes on devices that do not support transactions (e.g. without candidate data store).
There are also two types of locks involved with the transaction that are of interest to the service developer; the service write lock and the transaction lock. The latter is a global lock, required to serialize transactions, while the former is a per-service-type lock for serializing services that cannot be run in parallel. See Scaling and Performance Optimization for more details and their impact on performance.
The first phase, historically called validation, does more than just validate data and is the phase a service deals with the most. The other three support the NSO service framework but a service developer rarely interacts with directly.
We can further break down the first phase into the following stages:
rollback creation
pre-transform validation
transforms
full data validation
conflict check and transaction lock
When the transaction starts applying, NSO captures the initial intent and creates a rollback file, which allows one to reverse or roll back the intent. For example, the rollback file might contain the information that you changed a service instance parameter but it would not contain the service-produced device changes.
Then the first, partial validation takes place. It ensures the service input parameters are valid according to the service YANG model, so the service code can safely use provided parameter values.
Next, NSO runs transaction hooks and performs the necessary transforms, which alter the data before it is saved, for example encrypting passwords. This is also where the Service Manager invokes FASTMAP and service mapping callbacks, recording the resulting changes. NSO takes service write locks in this stage, too.
After transforms, there are no more changes to the configuration data, and the full validation starts, including YANG model constraints over the complete configuration, custom validation through validation points, and configuration policies (see Policies in Operation and Usage).
Throughout the phase, the transaction engine makes checkpoints, so it can restart the transaction faster in case of concurrency conflicts. The check for conflicts happens at the end of this first phase when NSO also takes the global transaction lock. Concurrency is further discussed in NSO Concurrency Model.
Service Callbacks
The main callback associated with a service point is the create callback, designed to produce the required (new) configuration, while FASTMAP takes care of the other operations, such as update and delete.
NSO implements two additional, optional callbacks for scenarios where create is insufficient. These are pre- and post-modification callbacks that NSO invokes before (pre) or after (post) create. These callbacks work outside of the scope tracked by FASTMAP. That is, changes done in pre- and post-modification do not automatically get removed during the update or delete of the service instance.
For example, you can use the pre-modification callback to check the service prerequisites (pre-check) or make changes that you want persisted even after the service is removed, such as enabling some global device feature. The latter may be required when NSO is not the only system managing the device and removing the feature configuration would break non-NSO managed services.
Similarly, you might use post-modification to reset the configuration to some default after the service is removed. Say the service configures an interface on a router for customer VPN. However, when the service is deprovisioned (removed), you don't want to simply erase the interface configuration. Instead, you want to put it in shutdown and configure it for a special, unused VLAN. The post-modification callback allows you to achieve this goal.
The main difference from create callback is that pre- and post-modification are called on update and delete, as well as service create. Since the service data node may no longer exist in case of delete, the API for these callbacks does not supply the service
object. Instead, the callback receives the operation and key path to the service instance. See the following API signatures for details.
The Python callbacks use the following function arguments:
tctx
: A TransCtxRef object containing transaction data, such as user session and transaction handle information.op
: Integer representing operation: create (ncs.dp.NCS_SERVICE_CREATE
), update (ncs.dp.NCS_SERVICE_UPDATE
), or delete (ncs.dp.NCS_SERVICE_DELETE
) of the service instance.kp
: A HKeypathRef object with a key path of the affected service instance, such as/svc:my-service{instance1}
.root
: A Maagic node for the root of the data model.service
: A Maagic node for the service instance.proplist
: Opaque service properties, see Persistent Opaque Data.
The Java callbacks use the following function arguments:
context
: A ServiceContext object for accessing root and service instance NavuNode in the current transaction.operation
: ServiceOperationType enum representing operation:CREATE
,UPDATE
,DELETE
of the service instance.path
: A ConfPath object with a key path of the affected service instance, such as/svc:my-service{instance1}
.ncsRoot
: A NavuNode for the root of thencs
data model.service
: A NavuNode for the service instance.opaque
: Opaque service properties, see Persistent Opaque Data.
See examples.ncs/service-management/iface-postmod-py and examples.ncs/service-management/iface-postmod-java examples for a sample implementation of the post-modification callback.
Additionally, you may implement these callbacks with templates. Refer to Service Callpoints and Templates for details.
Persistent Opaque Data
FASTMAP greatly simplifies service code, so it usually only needs to deal with the initial mapping. NSO achieves this by first discarding all the configuration performed during the create callback of the previous run. In other words, the service create code always starts anew, with a blank slate.
If you need to keep some private service data across runs of the create callback, or pass data between callbacks, such as pre- and post-modification, you can use opaque properties.
The opaque object is available in the service callbacks as an argument, typically named proplist
(Python) or opaque
(Java). It contains a set of named properties with their corresponding values.
If you wish to use the opaque properties, it is crucial that your code returns the properties object from the create call, otherwise, the service machinery will not save the new version.
Compared to pre- and post-modification callbacks, which also persist data outside of FASTMAP, NSO deletes the opaque data when the service instance is deleted, unlike with the pre- and post-modification data.
The examples.ncs/service-management/iface-postmod-py and examples.ncs/service-management/iface-postmod-java examples showcase the use of opaque properties.
Defining Static Service Conflicts
NSO by default enables concurrent scheduling and execution of services to maximize throughput. However, concurrent execution can be problematic for non-thread-safe services or services that are known to always conflict with themselves or other services, such as when they read and write the same shared data. See NSO Concurrency Model for details.
To prevent NSO from scheduling a service instance together with an instance of another service, declare a static conflict in the service model, using the ncs:conflicts-with
extension. The following example shows a service with two declared static conflicts, one with itself and one with another service, named other-service
.
This means each service instance will wait for other service instances that have started sooner than this one (and are of example-service or other-service type) to finish before proceeding.
Reference Counting Overlapping Configuration
FASTMAP knows that a particular piece of configuration belongs to a service instance, allowing NSO to revert the change as needed. But what happens when several service instances share a resource that may or may not exist before the first service instance is created? If the service implementation naively checks for existence and creates the resource when it is missing, then the resource will be tracked with the first service instance only. If, later on, this first instance is removed, then the shared resource is also removed, affecting all other instances.
A well-known solution to this kind of problem is reference counting. NSO uses reference counting by default with the XML templates and Python Maagic API, while in Java Maapi and Navu APIs, the sharedCreate()
, sharedSet()
, and sharedSetValues()
functions need to be used.
When enabled, the reference counter allows FASTMAP algorithm to keep track of the usage and only delete data when the last service instance referring to this data is removed.
Furthermore, containers and list items created using the sharedCreate()
and sharedSetValues()
functions also get an additional attribute called backpointer
. (But this functionality is currently not available for individual leafs.)
backpointer
points back to the service instance that created the entity in the first place. This makes it possible to look at part of the configuration, say under /devices
tree, and answer the question: which parts of the device configuration were created by which service?
To see reference counting in action, start the examples.ncs/service-management/implement-a-service/iface-v3 example with make demo
and configure a service instance.
Then configure another service instance with the same parameters and use the display service-meta-data
pipe to show the reference counts and backpointers:
Notice how commit dry-run
produces no new device configuration but the system still tracks the changes. If you wish, remove the first instance and verify the GigabitEthernet 0/1
configuration is still there, but is gone when you also remove the second one.
But what happens if the two services produce different configurations for the same node? Say, one sets the IP address to 10.1.2.3
and the other to 10.1.2.4
. Conceptually, these two services are incompatible, and instantiating both at the same time produces a broken configuration (instantiating the second service instance breaks the configuration for the first). What is worse is that the current configuration depends on the order the services were deployed or re-deployed. For example, re-deploying the first service will change the configuration from 10.1.2.4
back to 10.1.2.3
and vice versa. Such inconsistencies break the declarative configuration model and really should be avoided.
In practice, however, NSO does not prevent services from producing such configuration. But note that we strongly recommend against it and that there are associated limitations, such as service un-deploy not reverting configuration to that produced by the other instance (but when all services are removed, the original configuration is still restored).
The commit | debug
service pipe command warns about any such conflict that it finds but may miss conflicts on individual leafs. The best practice is to use integration tests in the service development life cycle to ensure there are no conflicts, especially when multiple teams develop their own set of services that are to be deployed on the same NSO instance.
Stacked Services
Much like a service in NSO can provision device configurations, it can also provision other, non-device data, as well as other services. We call the approach of services provisioning other services 'service stacking' and the services that are involved — 'stacked'.
Service stacking concepts usually come into play for bigger, more complex services. There are a number of reasons why you might prefer stacked services to a single monolithic one:
Smaller, more manageable services with simpler logic.
Separation of concerns and responsibility.
Clearer ownership across teams for (parts of) overall service.
Smaller services, reusable as components across the solution.
Avoid overlapping configuration between service instances causing conflicts, such as using one service instance per device (see examples in Designing for Maximal Transaction Throughput).
Stacked services are also the basis for LSA, which takes this concept even further. See Layered Service Architecture for details.
The standard naming convention with stacked services distinguishes between a Resource-Facing Service (RFS), that directly configures one or more devices, and a Customer-Facing Service (CFS), that is the top-level service, configuring only other services, not devices. There can be more than two layers of services in the stack, too.
While NSO does not prevent a single service from configuring devices as well as services, in the majority of cases this results in a less clean design and is best avoided.
Overall, creating stacked services is very similar to the non-stacked approach. First, you can design the RFS services as usual. Actually, you might take existing services and reuse those. These then become your lower-level services, since they are lower in the stack.
Then you create a higher-level service, say a CFS, that configures another service, or a few, instead of a device. You can even use a template-only service to do that, such as:
The preceding example references an existing iface
service, such as the one in the examples.ncs/service-management/implement-a-service/iface-v3 example. The output shows hard-coded values but you can change those as you would for any other service.
In practice, you might find it beneficial to modularize your data model and potentially reuse parts in both, the lower- and higher-level service. This avoids duplication while still allowing you to directly expose some of the lower-level service functionality through the higher-level model.
The most important principle to keep in mind is that the data created by any service is owned by that service, regardless of how the mapping is done (through code or templates). If the user deletes a service instance, FASTMAP will automatically delete whatever the service created, including any other services. Likewise, if the operator directly manipulates service data that is created by another service, the higher-level service becomes out of sync. The check-sync service action checks this for services as well as devices.
In stacked service design, the lower-level service data is under the control of the higher-level service and must not be directly manipulated. Only the higher-level service may manipulate that data. However, two higher-level services may manipulate the same structures, since NSO performs reference counting (see Reference Counting Overlapping Configuration).
Stacked Service Design
Designing services in NSO offers a great deal of flexibility with multiple approaches available to suit different needs. But what’s the best way to go about it? At its core, a service abstracts a network service or functionality, bridging user-friendly inputs with network configurations. This definition leaves the implementation open-ended, providing countless possibilities for designing and building services. However, there are certain techniques and best practices that can help enhance performance and simplify ongoing maintenance, making your services more efficient and easier to manage.
Regardless of the type of service chosen—whether Java, Python, or plain template services—there are certain design patterns that can be followed to improve their long-term effectiveness. Rather than diving into API-level specifics, we’ll focus on higher-level design principles, with an emphasis on leveraging the stacked service approach for maximum efficiency and scalability.
Service Performance
When designing a service, the first step is to identify the functionality of the network service and the corresponding device configurations it encompasses. The service should then be designed to generate those configurations. These configurations can either be static—hard-coded into the service if they remain consistent across all instances—or dynamic, represented as variables that adapt based on the service’s input parameters.
The flexibility in service design is virtually limitless, as both Java and Python can be used to define services, allowing for the generation of static or dynamic configurations based on minimal input. Ultimately, the goal is to have the service efficiently represent as much of the required device configuration as possible, while minimizing the number of input parameters.
When striving to achieve the goal of producing comprehensive device configurations, it's common to end up with a service that generates an extensive set of configurations. At first glance, this might seem ideal; however, it can introduce significant performance challenges.
Service Bottlenecks
As the volume of a service's device configurations increases, its performance often declines. Both creating and modifying the service take longer, regardless of whether the change involves a single line of configuration or the entire set. In fact, the execution time of the service remains consistent for all modifications and increases proportionally with the size of the configurations it generates.
The underlying reason for this behavior is tied to FASTMAP. Without delving too deeply into its mechanics, FASTMAP essentially runs the service logic anew with every deploy or re-deploy (modification), regenerating all the device configurations from scratch. This process not only re-executes user-defined logic—whether in Java, Python, or templates—but also tasks NSO with generating the reverse diffset for the service. As the size of the reverse diffset grows, so does the computational load, leading to slower performance.
From this, it's clear that writing efficient service logic is crucial. Optimizing the time complexity of operations within the service callbacks will naturally improve performance, just as with any other software. However, there's a less obvious yet equally important factor to consider: minimizing the service diffset. A smaller diffset results in better performance overall.
At first glance, this might seem to contradict the initial goal of representing as much configuration as possible with minimal input parameters. This apparent conflict is where the concept of stacked services comes into play, offering a way to balance these priorities effectively.
We want a service to generate as much configuration as possible, but it doesn’t need to handle everything on its own. While a single service becomes slower as it takes on more, distributing the workload across multiple services introduces a new dimension of optimization.
For example, consider a simple service that configures interface descriptions. While not a real network service, it serves as a useful illustration of the impact of heavy operations and large diffsets. Let's explore how this approach can help optimize performance.
Each service instance will take, as input, a list of devices to configure and the number of interfaces to be configured for each device.
The callback will then iterate through each provided device, creating interfaces and assigning descriptions in a loop.
When evaluating the service's performance, there are two key aspects to consider: the callback execution time and the time NSO takes to calculate the diffset. To analyze these, we can use NSO’s progress trace to gather statistics. Let’s start with an example involving three devices and 10 interfaces:
The two key events we need to focus on are the create event for the service, which provides the execution time of the create callback, and the "saving reverse diff-set and applying changes" event, which shows how long NSO took to calculate the reverse diff-set.
Let’s capture the same data for 100 and 1000 interfaces to compare the results.
We can observe that the time scales proportionally with the workload in the create callback as well as the size of the diffset. To demonstrate that the time remains consistent regardless of the size of the modification, we add one more interface to the 1000 interfaces already configured.
From the progress trace, we can see that adding one interface took about the same amount of time as adding 1000 interfaces.
Fastmap offers significant benefits to our solution, but this performance trade-off is an unavoidable cost. As a result, our service will remain consistently slow for all modifications as long as it handles large-scale device configurations. To address this, our focus must shift to reducing the size of the device configuration.
Service Stacking
The solution lies in distributing the configurations across multiple services while assigning the main service the role of managing these individual services. By analyzing the current service's functionality, we can easily identify how to break it down—by device. Instead of having a single service provisioning multiple devices, we will transition to a setup where one main service provisions multiple sub-services, with each sub-service responsible for provisioning a single device. The resulting structure will look as follows.
We'll begin by renaming our python-service
to upper-python-service
. This distinction is purely for clarity and to differentiate the two service types. In practice, the naming itself is not critical, as long as it aligns with the desired naming conventions for the northbound API, which represents the customer-facing service. The upper-python-service
will still function as the main service that users interact with to configure interfaces on multiple devices, just as in the previous example.
The upper-python-service
however, will not provision any devices directly. Instead, it will delegate that responsibility to another layer of services by creating and managing those subordinate services.
The lower-python-service
will be created by the upper-python-service
and will ultimately handle provisioning the device. This service is designed to take only a single device as input, which corresponds to the device it will provision. The behavior and interaction between the two services can be observed in the Python callbacks that define their logic.
The upper service creates a lower service for each device, and each lower service is responsible for provisioning its assigned device and populating its interfaces. This approach distributes the workload, reducing the load on individual services. The upper service loops over the total number of devices and generates a diffset consisting of the input parameters for each lower service. Each lower service then loops over the interfaces for its specific device and creates a diffset covering all interfaces for that device.
All of this happens within a single NSO transaction, ensuring that, from the user’s perspective, the behavior remains identical to the previous design.
At this point, you might wonder: if this still occurs in a single transaction and the total number of loops and combined diffset size remain unchanged, how does this improve performance? That’s a valid observation. When creating a large dataset all at once, this approach doesn’t provide a performance gain—in fact, the addition of an extra service layer might introduce a minimal and negligible amount of overhead.
However, the real benefit becomes apparent in update scenarios, as we’ll illustrate below.
We begin by creating the service to configure 1000 interfaces for each device.
The execution time of the upper-python-service
turned out to be relatively low, as expected. This is because it only involves a loop with three iterations, where data is passed from the input of the upper-python-service
to each corresponding lower-python-service
.
Similarly, calculating the diffset is also efficient. The reverse diffset for the upper-python-service
only includes the configuration for the lower-python-services
, which consists of just a few lines. This minimal complexity keeps both execution time and diffset calculation fast and lightweight.
In the same transaction, we also observe the execution of the three lower-python-services
.
Each service callback took approximately 8 seconds to execute, and calculating the diffset took around 2.5 seconds per service. This results in a total callback execution time of about 24 seconds and a total diffset calculation time of around 8 seconds, which is less than the time required in the previous service design.
So, what’s the advantage of stacking services like this? The real benefit becomes evident during updates. Let’s add an interface to device CE-1
, just as we did with the previous design, to illustrate this.
Observing the progress trace generated for this scenario would give a clearer understanding. From the trace, we see that the upper-python-service
was invoked and executed just as quickly as it did during the initial deployment. The same applies to the callback execution and diffset calculation time for the lower-python-service
handling CE-1
.
But what about CE-2
and PE-1
? Interestingly, there are no traces of these services in the log. That’s because they were never executed. The modification was passed only to the relevant lower-python-service
for CE-1
, while the other two services remained untouched.
And that is the power of stacked services.
Resource-Facing Layer
Does this mean the more we stack, the better? Should every single line of configuration be split into its own service? The answer is no. In most real-world cases, the primary performance bottleneck is the diffset calculation rather than the callback execution time. Service callbacks typically aren't computationally intensive, nor should they be.
Stacked services are generally used to address issues with diffset calculation, and this strategy is only effective if we can reduce the diffset size of the "hottest" service. However, increasing the number of services managed by the upper service also increases the total configuration it must generate on each re-deploy. This trade-off needs careful consideration to strike the right balance.
Modeling the Layer
When restructuring a service into a stacked service model, the first target should always be devices. If a service configures multiple devices, it’s a good practice to split it up by adding another layer of services, ensuring that no more than one device is provisioned by any service at the lowest layer. This approach reduces the service's complexity, making it easier to maintain.
Focusing on a single device per service also provides significant advantages in various scenarios, such as restoring consistency when a device goes out of sync, handling NED migrations, hardware upgrades, or even migrating a device between NSO instances.
The lower service we created uses the device name as its key. The primary reason for this is to ensure a clear separation of service instances based on the devices they are deployed on. One key benefit of this approach is the ability to easily identify all services deployed on a specific device by simply filtering for that device. For example, after adding a few more services, you could list all services associated with a particular device using a show
command similar to the following.
While the complete distribution of the service looks like this:
This approach provides an excellent way to maintain an overview of services deployed on each device. However, introducing new service types presents a challenge: you wouldn’t be able to see all service types with a single show command. For instance, show lower-python-service ...
will only display instances of the lower-python-service
. But what happens when the device also has L2VPNs, L3VPNs, or other service types, as it would in a real network?
Organizing the Schema
To address this, we can nest the services within another list. By organizing all services under a common structure, we enable the ability to view and manage multiple service types for a device in a unified manner, providing a comprehensive overview with a single command.
To illustrate this approach, we need to introduce another service type. Moving beyond the dummy example, let’s use a more realistic scenario: the mpls-vpn-simple example. We'll refactor this service to adopt the stacked service approach while maintaining the existing customer-facing interface.
After the refactor, the service will shift from provisioning multiple devices directly through a single instance to creating a separate service instance for each device, VPN, and endpoint, what we call resource-facing services. These resource-facing services will be structured so that all device-specific services are grouped under a node for each device.
This is accomplished by introducing a list of devices, modeled within a separate package. We’ll create this new package and call it resource-facing-services
, with the following model definition:
This model allows us to organize services by device, providing a unified structure for managing and querying all services deployed on each device.
Each element in this list will represent a device and all the services deployed on it. The model itself is empty, which is intentional, as each resource-facing service (RFS) will be added to this list through augmentation from its respective package. The YANG model for the RFS version of our L3VPN service is designed specifically to integrate seamlessly into this structure.
We deploy an L3VPN to our network with two CE endpoints by creating the following l3vpn
customer-facing service.
After deploying our service, we can quickly gain an overview of the services deployed on a device without needing to analyze or reverse-engineer its configurations. For example, we can see that the device PE-1
is acting as a PE for two different endpoints within a VPN.
CE-1
serves as a CE for that VPN.
And CE-2
serves as another CE for that VPN.
Caveats and Best Practices
This section lists some specific advice for implementing services, as well as any known limitations you might run into.
You may also obtain some useful information by using the debug service
commit pipe command, such as commit dry-run | debug service
. The command display the net effect of the service create code, as well as issue warnings about potentially problematic usage of overlapping shared data.
Service callbacks must be deterministic: NSO invokes service callbacks in a number of situations, such as for dry-run, check sync, and actual provisioning. If a service does not create the same configuration from the same inputs, NSO sees it as being out of sync, resulting in a lot of configuration churn and making it incompatible with many NSO features. If you need to introduce some randomness or rely on some other nondeterministic source of data, make sure to cache the values across callback invocations, such as by using opaque properties (see Persistent Opaque Data) or persistent operational data (see Operational Data) populated in a pre-modification callback.
Never overwrite service inputs: Service input parameters capture client intent and a service should never change its own configuration. Such behavior not only muddles the intent but is also temporary when done in the create callback, as the changes are reverted on the next invocation.
If you need to keep some additional data that cannot be easily computed each time, consider using opaque properties (see Persistent Opaque Data) or persistent operational data (see Operational Data) populated in a pre-modification callback.
No service ordering in a transaction: NSO is a transactional system and as such does not have the concept of order inside a single transaction. That means NSO does not guarantee any specific order in which the service mapping code executes if the same transaction touches multiple service instances. Likewise, your code should not make any assumptions about running before or after other service code.
Return value of create callback: The create callback is not the exclusive user of the opaque object; the object can be chained in several different callbacks, such as pre- and post-modification. Therefore, returning
None/null
from create callback is not a good practice. Instead, always return the opaque object even if the create callback does not use it.Avoid delete in service create: Unlike creation, deleting configuration does not support reference counting, as there is no data left to reference count. This means the deleted elements are tied to the service instance that deleted them.
Additionally, FASTMAP must store the entire deleted tree and restore it on every service change or re-deploy, only to be deleted again. Depending on the amount of deleted data, this is potentially an expensive operation.
So, a general rule of thumb is to never use delete in service create code. If an explicit delete is used,
debug service
may display the following warning:\However, the service may also delete data implicitly, through
when
andchoice
statements in the YANG data model. If awhen
statement evaluates to false, the configuration tree below that node is deleted. Likewise, if acase
is set in achoice
statement, the previously setcase
is deleted. This has the same limitations as an explicit delete.To avoid these issues, create a separate service, that only handles deletion, and use it in the main service through the stacked service design (see Stacked Services). This approach allows you to reference count the deletion operation and contains the effect of restoring deleted data through a small, rarely-changing helper service. See examples.ncs/service-management/shared-delete for an example.
Alternatively, you might consider pre- and post-modification callbacks for some specific cases.
Prefer
shared*()
functions: Non-shared create and set operations in the Java and Python low-level API do not add reference counts or backpointer information to changed elements. In case there is overlap with another service, unwanted removal can occur. See Reference Counting Overlapping Configuration for details.In general, you should prefer
sharedCreate()
,sharedSet()
,sharedSetValues()
, andloadConfigCmds()
. If non-shared variants are used in a shared context,service debug
displays a warning, such as:\Likewise, do not use other MAAPI
load_config
variants from the service code. Use theloadConfigCmds()
orsharedSetValues()
function to load XML data from a file or a string. See examples.ncs/scaling-performance/perf-bulkcreate for an example.Reordering ordered-by-user lists: If the service code rearranges an ordered-by-user list with items that were created by another service, that other service becomes out of sync. In some cases, you might be able to avoid out-of-sync scenarios by leveraging special XML template syntax (see Operations on ordered lists and leaf-lists) or using service stacking with a helper service.
In general, however, you should reconsider your design and try to avoid such scenarios.
Automatic upgrade of keys for existing services is unsupported: Service backpointers, described in Reference Counting Overlapping Configuration, rely on the keys that the service model defines to identify individual service instances. If you update the model by adding, removing, or changing the type of leafs used in the service list key, while there are deployed service instances, the backpointers will not be automatically updated. Therefore, it is best to not change the service list key.
A workaround, if the service key absolutely must change, is to first perform a no-networking undeploy of the affected service instances, then upgrade the model, and finally no-networking re-deploy the previously un-deployed services.
Avoid conflicting intents: Consider that a service is executed as part of a transaction. If, in the same transaction, the service gets conflicting intents, for example, it gets modified and deleted, the transaction is aborted. You must decide which intent has higher priority and design your services to avoid such situations.
Service Discovery and Import
A very common situation, when NSO is deployed in an existing network, is that the network already has services implemented. These services may have been deployed manually or through an older provisioning system. To take full advantage of the new system, you should consider importing the existing services into NSO. The goal is to use NSO to manage existing service instances, along with adding new ones in the future.
The process of identifying services and importing them into NSO is called Service Discovery and can be broken down into the following high-level parts:
Implementing the service to match existing device configuration.
Enumerating service instances and their parameters.
Amend the service metadata references with reconciliation.
Ultimately, the problem that service discovery addresses is one of referencing or linking configuration to services. Since the network already contains target configuration, a new service instance in NSO produces no changes in the network. This means the new service in NSO by default does not own the network configuration. One side effect is that removing a service will not remove the corresponding device configuration, which is likely to interfere with service modification as well.
Some of the steps in the process can be automated, while others are mostly manual. The amount of work differs a lot depending on how structured and consistent the original deployment is.
Matching Configuration
A prerequisite (or possibly the product in an iterative approach) is an NSO service that supports all the different variants of the configuration for the service that are used in the network. This usually means there will be a few additional parameters in the service model that allow selecting the variant of device configuration produced, as well as some covering other non-standard configurations (if such configuration is present).
In the simplest case, there is only one variant and that is the one that the service needs to produce. Let's take the examples.ncs/service-management/implement-a-service/iface-v2-py example and consider what happens when a device already has an existing interface configuration.
Configuring a new service instance does not produce any new device configuration (notice that device c1 has no changes).
However, when committed, NSO records the changes, just like in the case of overlapping configuration (see Reference Counting Overlapping Configuration). The main difference is that there is only a single backpointer, to a newly configured service, but the refcount
is 2. The other item, that contributes to the refcount
, is the original device configuration. Which is why the configuration is not deleted when the service instance is.
Enumerating Instances
A prerequisite for service discovery to work is that it is possible to construct a list of the already existing services. Such a list may exist in an inventory system, an external database, or perhaps just an Excel spreadsheet.
You can import the list of services in a number of ways. If you are reading it in from a spreadsheet, a Python script using NSO API directly (Basic Automation with Python) and a module to read Excel files is likely a good choice.
Or, you might generate an XML data file to import using the ncs_load
command; use display xml
filter to help you create a template:
Regardless of the way you implement the data import, you can run into two kinds of problems.
On one hand, the service list data may be incomplete. Suppose that the earliest service instances deployed did not take the network mask as a parameter. Moreover, for some specific reasons, a number of interfaces had to deviate from the default of 28 and that information was never populated back in the inventory for old services after the netmask parameter was added.
Now the only place where that information is still kept may be the actual device configuration. Fortunately, you can access it through NSO, which may allow you to extract the missing data automatically, for example:
On the other hand, some parameters may be NSO specific, such as those controlling which variant of configuration to produce. Again, you might be able to use a script to find this information, or it could turn out that the configuration is too complex to make such a script feasible.
In general, this can be the most tricky part of the service discovery process, making it very hard to automate. It all comes down to how good the existing data is. Keep in mind that this exercise is typically also a cleanup exercise, and every network will be different.
Reconciliation
The last step is updating the metadata, telling NSO that a given service controls (owns) the device configuration that was already present when the NSO service was configured. This is called reconciliation and you achieve it using a special re-deploy reconcile
action for the service.
Let's examine the effects of this action on the following data:
Having run the action, NSO has updated the refcount
to remove the reference to the original device configuration:
What is more, the reconcile algorithm works even if multiple service instances share configuration. What if you had two instances of the iface
service, instead of one?
Before reconciliation, the device configuration would show a refcount of three.
Invoking re-deploy reconcile
on either one or both of the instances makes the services sole owners of the configuration.
This means the device configuration is removed only when you remove both service instances.
The reconcile operation only removes the references to the original configuration (without the service backpointer), so you can execute it as many times as you wish. Just note that it is part of a service re-deploy, with all the implications that brings, such as potentially deploying new configuration to devices when you change the service template.
As an alternative to the re-deploy reconcile
, you can initially add the service configuration with a commit reconcile
variant, performing reconciliation right away.
Iterative Approach
It is hard to design a service in one go when you wish to cover existing configurations that are exceedingly complex or have a lot of variance. In such cases, many prefer an iterative approach, where you tackle the problem piece-by-piece.
Suppose there are two variants of the service configured in the network; iface-v2-py
and the newer iface-v3
, which produces a slightly different configuration. This is a typical scenario when a different (non-NSO) automation system is used and the service gradually evolves over time. Or, when a Method of Procedure (MOP) is updated if manual provisioning is used.
We will tackle this scenario to show how you might perform service discovery in an iterative fashion. We shall start with the iface-v2-py
as the first iteration of the iface
service, which represents what configuration the service should produce to the best of our current knowledge.
There are configurations for two service instances in the network already: for interfaces 0/1 and 0/2 on the c1
device. So, configure the two corresponding iface
instances.
You can also use the commit no-deploy
variant to add service parameters when a normal commit would produce device changes, which you do not want.
Then use the re-deploy reconcile { discard-non-service-config } dry-run
command to observe the difference between the service-produced configuration and the one present in the network.
For instance1
, the config is the same, so you can safely reconcile it already.
But interface 0/2 (instance2
), which you suspect was initially provisioned with the newer version of the service, produces the following:
The output tells you that the service is missing the ip dhcp snooping trust
part of the interface configuration. Since the service does not generate this part of the configuration yet, running re-deploy reconcile { discard-non-service-config }
(without dry-run) would remove the DHCP trust setting. This is not what we want.
One option, and this is the default reconcile mode, would be to use keep-non-service-config
instead of discard-non-service-config
. But that would result in the service taking ownership of only part of the interface configuration (the IP address).
Instead, the right approach is to add the missing part to the service template. There is, however, a little problem. Adding the DHCP snooping trust configuration unconditionally to the template can interfere with the other service instance, instance1
.
In some cases, upgrading the old configuration to the new variant is viable, but in most situations, you likely want to avoid all device configuration changes. For the latter case, you need to add another parameter to the service model that selects the configuration variant. You must update the template too, producing the second iteration of the service.
With the updated configuration, you can now safely reconcile the service2
service instance:
Nevertheless, keep in mind that the discard-non-service-config reconcile operation only considers parts of the device configuration under nodes that are created with the service mapping. Even if all data there is covered in the mapping, there could still be other parts that belong to the service but reside in an entirely different section of the device configuration (say DNS configuration under ip name-server
, which is outside the interface GigabitEthernet
part) or even a different device. That kind of configuration the discard-non-service-config
option cannot find on its own and you must add manually.
You can find the complete iface
service as part of the [examples.ncs/service-management/discovery)(https://github.com/NSO-developer/nso-examples/tree/6.4/service-management/discovery) example.
Since there were only two service instances to reconcile, the process is now complete. In practice, you are likely to encounter multiple variants and many more service instances, requiring you to make additional iterations. But you can follow the iterative process shown here.
Partial Sync
In some cases a service may need to rely on the actual device configurations to compute the changeset. It is often a requirement to pull the current device configurations from the network before executing such service. Doing a full sync-from
on a number of devices is an expensive task, especially if it needs to be performed often. The alternative way in this case is using partial-sync-from
.
In cases where a multitude of service instances touch a device that is not entirely orchestrated using NSO, i.e. relying on the partial-sync-from
feature described above, and the device needs to be replaced then all services need to be re-deployed. This can be expensive depending on the number of service instances. Partial-sync-to
enables the replacement of devices in a more efficient fashion.
Partial-sync-from
and partial-sync-to
actions allow to specify certain portions of the device's configuration to be pulled or pushed from or to the network, respectively, rather than the full config. These are more efficient operations on NETCONF devices and NEDs that support the partial-show feature. NEDs that do not support the partial-show feature will fall back to pulling or pushing the whole configuration.
Even though partial-sync-from
and partial-sync-to
allows to pull or push only a part of the device's configuration, the actions are not allowed to break the consistency of configuration in CDB or on the device as defined by the YANG model. Hence, extra consideration needs to be given to dependencies inside the device model. If some configuration item A depends on configuration item B in the device's configuration, pulling only A may fail due to unsatisfied dependency on B. In this case, both A and B need to be pulled, even if the service is only interested in the value of A.
It is important to note that partial-sync-from
and partial-sync-to
clear the transaction ID of the device in NSO unless the whole configuration has been selected (e.g. /ncs:devices/ncs:device[ncs:name='ex0']/ncs:config
). This ensures NSO does not miss any changes to other parts of the device configuration but it does make the device out of sync.
Partial sync-from
sync-from
Pulling the configuration from the network needs to be initiated outside the service code. At the same time, the list of configuration subtrees required by a certain service should be maintained by the service developer. Hence it is a good practice for such a service to implement a wrapper action that invokes the generic /devices/partial-sync-from
action with the correct list of paths. The user or application that manages the service would only need to invoke the wrapper action without needing to know which parts of the configuration the service is interested in.
The snippet in the example below shows running the partial-sync-from
action via Java, using the router
device from the examples.ncs/device-management/router-network example.
Last updated
Was this helpful?