Skip to content

Commit

Permalink
NSMgr SDK like refactoring (networkservicemesh#1626)
Browse files Browse the repository at this point in the history
* SDK Like refactorings

SDK Like refactorings

Signed-off-by: Andrey Sobolev <[email protected]>

* Allow tracing parameters overrixe

Signed-off-by: Andrey Sobolev <[email protected]>

* Monitoring updates

Signed-off-by: Andrey Sobolev <[email protected]>

* Rollback api move

Signed-off-by: Andrey Sobolev <[email protected]>

* Refactor opentracing stuff

Signed-off-by: Andrey Sobolev <[email protected]>

* Opentracing fixes

Signed-off-by: Andrey Sobolev <[email protected]>

* Fixes in case of no opentracing are enabled

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix test

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix NSM monitor test

Signed-off-by: Andrey Sobolev <[email protected]>

* Collect also init containers.

Signed-off-by: Andrey Sobolev <[email protected]>

* Add warmup for test

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix init container logs retrieval

Signed-off-by: Andrey Sobolev <[email protected]>

* Review cleanups

Signed-off-by: Andrey Sobolev <[email protected]>

* Resolve merge issue

Signed-off-by: Andrey Sobolev <[email protected]>

* go sum fix

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix tests compile

Signed-off-by: Andrey Sobolev <[email protected]>

* Use of spanhelper

Signed-off-by: Andrey Sobolev <[email protected]>

* More fixes

Signed-off-by: Andrey Sobolev <[email protected]>

* More context related stuff

Signed-off-by: Andrey Sobolev <[email protected]>

* Cleanup & merge fixes

Signed-off-by: Andrey Sobolev <[email protected]>

* Cleanups

Signed-off-by: Andrey Sobolev <[email protected]>

* Move spanhelper into proper place

Signed-off-by: Andrey Sobolev <[email protected]>

* Fixes from opentracing-fixes

Signed-off-by: Andrey Sobolev <[email protected]>

* Merge fixes and cleanups

Signed-off-by: Andrey Sobolev <[email protected]>

* Minor one line fix

Signed-off-by: Andrey Sobolev <[email protected]>

* Minor test fixes

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix remote NSMD healing case

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix NSMgr die timeout and long workflow

Signed-off-by: Andrey Sobolev <[email protected]>

* Fixed dataplane die events to be properly send

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix nse/nsmgr die healing

Signed-off-by: Andrey Sobolev <[email protected]>

* Healing should retry

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix go deps

Signed-off-by: Andrey Sobolev <[email protected]>

* Revert go dependencies

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix same endpoint selection failure, when NSMgr changed.

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix dataplane do close if no connection established

Signed-off-by: Andrey Sobolev <[email protected]>

* Speedup testing a bit

Signed-off-by: Andrey Sobolev <[email protected]>

* Non gracefull dataplane grpc server shutdown

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix xross connection tests

Signed-off-by: Andrey Sobolev <[email protected]>

* Get deploy failure information for pods.

Signed-off-by: Andrey Sobolev <[email protected]>

* Merge fixes

Signed-off-by: Andrey Sobolev <[email protected]>

* Improve pod deploy error detection

Signed-off-by: Andrey Sobolev <[email protected]>

* Fixes

Return errors if trying to close during close.
Fix issue with dataplane return extra connections and where is no workspace clients for them.

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix restore connections if not found workspace.

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix pod log retrieval

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix get logs

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix previous logs

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix remote VNI selection code

Signed-off-by: Andrey Sobolev <[email protected]>

* Fix nse clenaup

Signed-off-by: Andrey Sobolev <[email protected]>
  • Loading branch information
haiodo authored and edwarnicke committed Oct 16, 2019
1 parent 337df2b commit 1ff8903
Show file tree
Hide file tree
Showing 70 changed files with 4,763 additions and 1,584 deletions.
1 change: 1 addition & 0 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ linters:
- varcheck # deprecated
- unused # deprecated
- goimports
- dupl
enable-all: true
issues:
exclude-use-default: false
Expand Down
56 changes: 0 additions & 56 deletions controlplane/api/nsm/nsm.go

This file was deleted.

20 changes: 19 additions & 1 deletion controlplane/api/nsm/nsm_properties.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,18 @@ const (
NsmdHealEnabled = "NSMD_HEAL_ENABLED" // Does healing is enabled or not
// NsmdHealDSTWaitTimeout - environment variable name - timeout of waiting for networkservice when healing connection
NsmdHealDSTWaitTimeout = "NSMD_HEAL_DST_TIMEOUTs" // Wait timeout for DST in seconds
// NsmdHealRetryCount - amount of times healing will retry
NsmdHealRetryCount = "NSMD_HEAL_RETRY_COUNT"
)

// Properties - holds properties of NSM connection events processing
type Properties struct {
HealTimeout time.Duration
CloseTimeout time.Duration
HealRequestTimeout time.Duration
HealRequestConnectTimeout time.Duration
HealRetryCount int
HealRetryDelay time.Duration
HealRequestConnectCheckTimeout time.Duration
HealDataplaneTimeout time.Duration

Expand All @@ -35,9 +40,12 @@ func NewNsmProperties() *Properties {
values := &Properties{
HealTimeout: time.Minute * 1,
CloseTimeout: time.Second * 5,
HealRequestTimeout: time.Minute * 1,
HealRequestTimeout: time.Second * 20,
HealRequestConnectTimeout: time.Second * 15,
HealRequestConnectCheckTimeout: time.Second * 1,
HealDataplaneTimeout: time.Minute * 1,
HealRetryCount: 10,
HealRetryDelay: time.Second * 5,

// Total DST heal timeout is 20 seconds.
HealDSTNSEWaitTimeout: time.Second * 30, // Maximum time to wait for NSMD/NSE to re-appear
Expand All @@ -59,5 +67,15 @@ func NewNsmProperties() *Properties {
logrus.Errorf("Failed to parse DST wait timeout value... %v", err)
}
}

retryVal := os.Getenv(NsmdHealRetryCount)
if retryVal != "" {
value, err := strconv.ParseInt(retryVal, 10, 32)
if err != nil {
logrus.Error(err)
}
values.HealRetryCount = int(value)
}

return values
}
17 changes: 17 additions & 0 deletions controlplane/api/registry/utils.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
package registry

// EndpointNSMName - - a type to hold endpoint and nsm url composite type.
type EndpointNSMName string

//GetEndpointNSMName - return a Endpoint.Name + ":" + NetworkServiceManager.Url
func (nse *NSERegistration) GetEndpointNSMName() EndpointNSMName {
if nse == nil {
return ""
}
return NewEndpointNSMName(nse.NetworkServiceEndpoint, nse.NetworkServiceManager)
}

//NewEndpointNSMName - construct an NewEndpointNSMName from endpoint and manager
func NewEndpointNSMName(endpoint *NetworkServiceEndpoint, manager *NetworkServiceManager) EndpointNSMName {
return EndpointNSMName(endpoint.Name + ":" + manager.Url)
}
8 changes: 4 additions & 4 deletions controlplane/cmd/nsmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,13 +68,13 @@ func main() {

model := model.NewModel() // This is TCP gRPC server uri to access this NSMD via network.
defer serviceRegistry.Stop()
manager := nsm.NewNetworkServiceManager(model, serviceRegistry, pluginRegistry)
manager := nsm.NewNetworkServiceManager(span.Context(), model, serviceRegistry, pluginRegistry)

var server nsmd.NSMServer
var srvErr error
// Start NSMD server first, load local NSE/client registry and only then start dataplane/wait for it and recover active connections.

if server, srvErr = nsmd.StartNSMServer(span.Context(), model, manager, serviceRegistry, apiRegistry); srvErr != nil {
if server, srvErr = nsmd.StartNSMServer(span.Context(), model, manager, apiRegistry); srvErr != nil {
logrus.Errorf("error starting nsmd service: %+v", srvErr)
return
}
Expand All @@ -88,12 +88,12 @@ func main() {
nsmdGoals.SetNsmServerReady()

// Register CrossConnect monitorCrossConnectServer client as ModelListener
monitorCrossConnectClient := nsmd.NewMonitorCrossConnectClient(server, server.XconManager(), server)
monitorCrossConnectClient := nsmd.NewMonitorCrossConnectClient(model, server, server.XconManager(), server)
model.AddListener(monitorCrossConnectClient)

// Starting dataplane
logrus.Info("Starting Dataplane registration server...")
if err := server.StartDataplaneRegistratorServer(); err != nil {
if err := server.StartDataplaneRegistratorServer(span.Context()); err != nil {
span.LogError(errors.Wrap(err, "error starting dataplane service"))
return
}
Expand Down
111 changes: 111 additions & 0 deletions controlplane/pkg/api/nsm/nsm.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
// Copyright (c) 2019 Cisco and/or its affiliates.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at:
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package nsm

import (
"time"

"github.com/networkservicemesh/networkservicemesh/controlplane/api/crossconnect"
"github.com/networkservicemesh/networkservicemesh/controlplane/api/nsm"
unified_connection "github.com/networkservicemesh/networkservicemesh/controlplane/api/nsm/connection"
unified_networkservice "github.com/networkservicemesh/networkservicemesh/controlplane/api/nsm/networkservice"
"github.com/networkservicemesh/networkservicemesh/controlplane/pkg/model"
"github.com/networkservicemesh/networkservicemesh/controlplane/pkg/plugins"
"github.com/networkservicemesh/networkservicemesh/controlplane/pkg/serviceregistry"
"github.com/networkservicemesh/networkservicemesh/sdk/monitor"
crossconnect_monitor "github.com/networkservicemesh/networkservicemesh/sdk/monitor/crossconnect"

"golang.org/x/net/context"

local_networkservice "github.com/networkservicemesh/networkservicemesh/controlplane/api/local/networkservice"
"github.com/networkservicemesh/networkservicemesh/controlplane/api/registry"
remote_networkservice "github.com/networkservicemesh/networkservicemesh/controlplane/api/remote/networkservice"
)

// ClientConnection is an interface for client connection
type ClientConnection interface {
GetID() string
GetConnectionSource() unified_connection.Connection
GetConnectionDestination() unified_connection.Connection
GetNetworkService() string
}

// NetworkServiceClient is an interface for network service client
type NetworkServiceClient interface {
Request(ctx context.Context, request unified_networkservice.Request) (unified_connection.Connection, error)
Close(ctx context.Context, connection unified_connection.Connection) error

Cleanup() error
}

// HealState - keep the cause of healing process
type HealState int32

const (
// HealStateDstDown is a case when destination is down: we need to restore it and re-program local Dataplane.
HealStateDstDown HealState = 1
// HealStateSrcDown is a case when source is down: most probable will not happen yet.
HealStateSrcDown HealState = 2
// HealStateDataplaneDown is a case when local Dataplane is down: we need to heal NSE/Remote NSM and local Dataplane.
HealStateDataplaneDown HealState = 3
// HealStateDstUpdate is a case when destination is updated: we need to re-program local Dataplane.
HealStateDstUpdate HealState = 4
// HealStateDstNmgrDown is a case when destination and/or Remote NSM is down: we need to heal NSE/Remote NSM.
HealStateDstNmgrDown HealState = 5
)

// NetworkServiceRequestManager - allow to provide local and remote service interfaces.
type NetworkServiceRequestManager interface {
LocalManager(clientConnection ClientConnection) local_networkservice.NetworkServiceServer
RemoteManager() remote_networkservice.NetworkServiceServer
}

// NetworkServiceHealProcessor - perform Healing operations
type NetworkServiceHealProcessor interface {
Heal(ctx context.Context, clientConnection ClientConnection, healState HealState)
CloseConnection(ctx context.Context, clientConnection ClientConnection) error
}

// MonitorManager is an interface to provide access to different monitors
type MonitorManager interface {
CrossConnectMonitor() crossconnect_monitor.MonitorServer
LocalConnectionMonitor(workspace string) monitor.Server
}

//NetworkServiceManager - hold useful nsm structures
type NetworkServiceManager interface {
GetHealProperties() *nsm.Properties
WaitForDataplane(ctx context.Context, duration time.Duration) error
RemoteConnectionLost(ctx context.Context, clientConnection ClientConnection)
NotifyRenamedEndpoint(nseOldName, nseNewName string)
// Getters
NseManager() NetworkServiceEndpointManager
SetRemoteServer(server remote_networkservice.NetworkServiceServer)

Model() model.Model

NetworkServiceHealProcessor
ServiceRegistry() serviceregistry.ServiceRegistry
PluginRegistry() plugins.PluginRegistry
RestoreConnections(xcons []*crossconnect.CrossConnect, dataplane string, manager MonitorManager)
}

//NetworkServiceEndpointManager - manages endpoints, TODO: Will be removed in next PRs.
type NetworkServiceEndpointManager interface {
GetEndpoint(ctx context.Context, requestConnection unified_connection.Connection, ignoreEndpoints map[registry.EndpointNSMName]*registry.NSERegistration) (*registry.NSERegistration, error)
CreateNSEClient(ctx context.Context, endpoint *registry.NSERegistration) (NetworkServiceClient, error)
IsLocalEndpoint(endpoint *registry.NSERegistration) bool
CheckUpdateNSE(ctx context.Context, reg *registry.NSERegistration) bool
}
Loading

0 comments on commit 1ff8903

Please sign in to comment.