Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design document to improve the sql storage performance by using bulk save api #194

Closed
wants to merge 2 commits into from

Conversation

sanopsmx
Copy link
Contributor

@sanopsmx sanopsmx commented Nov 1, 2020

Improve the sql storage performance by using bulk save api

Status Proposed, Accepted, Implemented, Obsolete
RFC # spinnaker/gate#1385 , spinnaker/front50#988, spinnaker/orca#3987
Author(s) Sanjeev Thatiparthi (@sanopsmx) (https://github.com/sanopsmx)
SIG / WG sig-security

Overview

spinnaker/spinnaker#6147

This is valid only for sql storage.

When we have multiple pipelines to save , we need to call front50 save api serially one after the other.
This reduces the performance of front50 save api.

We improve the front50 save performance by using bulk insert query.
This will help save many pipelines at once which greatly improves the performance of front50.
We need to write a bulk save api which accepts an list of pipelines in a single rest api call
and save them in front50 using bulk insert query.

Goals and Non-Goals

We improve the front50 save performance by using bulk insert query.
This will help save many pipelines at once which greatly improves the performance of front50.
We need to write a bulk save api which accepts an list of pipelines in a single rest api call
and save them in front50 using bulk insert query.

Motivation and Rationale

To improve the performance of multiple saving of pipelines at once using bulk insert sql query.

Timeline

Already Implemented

Design

  1. Deck calls the gate api:

    1.1 url: http://:8084/pipelines/bulksave
    method: POST
    payload: [{"keepWaitingPipelines":false,"limitConcurrent":true,"application":"apptest","spelEvaluator":"v4",
    "lastModifiedBy":"admin","name":"pipeline_test","stages":[{"refId":"1","requisiteStageRefIds":[],
    "type":"wait","name":"Wait","waitTime":30}],"index":3,"id":"66a5c77b-3998-495d-ad61-117313c5800c",
    "triggers":[],"updateTs":"1599652266000"},{"keepWaitingPipelines":false,"limitConcurrent":true,
    "application":"apptest","spelEvaluator":"v4","lastModifiedBy":"admin","name":"pipeline_test1",
    "stages":[{"refId":"1","requisiteStageRefIds":[],"type":"wait","name":"Wait","waitTime":30}],
    "index":4,"id":"66a5c77b-3998-495d-ad61-117313c5800c","triggers":[],"updateTs":"1599652266000"}......]

  2. Gate api flow:

    2.1 The post request sent by the deck will hit the PipelineController Rest Controller endpoint.

     com.netflix.spinnaker.gate.controllers.PipelineController.bulksavePipeline(Body pipeline..)
    

    2.2 TaskService will create a new task(synchronous job) for saving the pipeline.
    taskService.createAndWaitForCompletion(operation)

    2.3 The above new task will call the orca service post api to save the pipeline.

     orcaService.doOperation(@Body Map<String, ? extends Object> body)
     url: 		http://<ip address>:8083/ops
     method: 	POST
    
  3. Orca api flow:

    3.1 The post request sent by the gate will hit the OperationsController Rest Controller endpoint.

    com.netflix.spinnaker.orca.controllers.OperationsController Map<String, String> ops(@RequestBody Map input) {
    

    3.2 The above method will start a new task with QueueExecutionRunner.kt.
    3.3 The QueueExecutionRunner will call the StartExecutionHandler.kt.
    3.4 The StartExecutionHandler.kt will call the SavePipelineTask.java.
    3.5 SavePipelineTask will call the front 50 rest api.
    Response response = front50Service.savePipelineList(List);
    url: http://:8080/pipelines/bulksave
    method: POST

  4. Fiat api flow:

    4.1 Front 50 will check with fiat for permissions.

    @PreAuthorize( "@fiatPermissionEvaluator.storeWholePermission() and
    authorizationSupport.hasRunAsUserPermission(List<pipeline>)")
    

    4.2 storeWholePermission() will check if the permissions object is not null..
    4.3 hasPermission() will check whether that particular resource(APPLICATION) has ‘WRITE’ permission to save or not.
    4.4 authorizationSupport.hasRunAsUserPermission() check happens at front50,
    which calls fiat to check whether that particular resource(Application) has ‘EXECUTE’ permission or not.

  5. Front50 api flow:

    5.1 The post request sent by the orca will hit the PipelineController Rest Controller endpoint.

    com.netflix.spinnaker.front50.controllers.PipelineController.bulksave(
      		@RequestBody List<Pipeline> pipelineList,
    @RequestParam(value = "staleCheck", required = false, Boolean staleCheck) {
    

    5.2 PipelineDAO will now save the pipeline list and return the response.

    	pipelineDAO.bulkImport(savePipelineList);
    

Usage:

The bulk API is invoked using Gate. The invocation is:

curl -X POST -H "Content-type: application/json" -d '[{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_004",
"spelEvaluator": "v4",
"name": "pipe1001",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1000,
"triggers": []
},
{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_005",
"spelEvaluator": "v4",
"name": "pipe1002",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1001,
"triggers": []
}]' http://:8084/pipelines/bulksave

Output:

{
“Successful”: ,
“Failed”: ,
“Failed_list”: [<array of failed pipelines - (application, pipelinename, etc) and the error message]
}

Assumptions:

This solution works only for RDBMS database store.
Save pipelines run in the context of an application, irrespective of the content of the pipeline.
In the case of bulk save, we will use the application name as “bulk save”.
The implication is that the tasks for bulk save will not show up under tasks in their respective applications

Sample json payload:

An array of valid Spinnaker pipeline jsons.

Example:
[
{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_004",
"spelEvaluator": "v4",
"name": "pipe1001",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1000,
"triggers": []
},
{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_005",
"spelEvaluator": "v4",
"name": "pipe1002",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1001,
"triggers": []
},
{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_006",
"spelEvaluator": "v4",
"name": "pipe1003",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1002,
"triggers": []
}
]

updated bulksave.md with the spinnaker issue no.
@robzienert
Copy link
Member

I'm unclear about when this would be used. Are we changing Deck to save multiple pipelines, and if so, when does that occur?

@emagana-zz
Copy link

@robzienert Hi There! @sanopsmx, his team and salesforce are working together on this new feature. Basically, we templatize all our pipelines and we have macros that could apply to all our pipelines (over 100K). Therefore, we need to update them all. This is why we need a Bulk API to speed up the writing of the pipelines. If we do not have this feature, we need to update one pipeline at the time and it makes the process extremely slow, one update could take over 5 hours for those 100K pipelines. let me know if I can address more questions.

@ajordens
Copy link
Contributor

ajordens commented Nov 3, 2020

I added a comment on one of the PRs (I believe) but there's some prior art in front50 around this already.

See the /batchUpdate endpoints (edit: I see that the proposal looks to use them).

On the other hand, I think with this intended volume of discrete pipelines you're trying to fit a square peg into a round hole.

What some folks have done on our side is build a pipeline generator outside of Spinnaker. The generator itself is parameterized and generates pipeline execution json that is fed into orca. Each permutation is not tracked as an individual pipeline config.


2.1 The post request sent by the deck will hit the PipelineController Rest Controller endpoint.

com.netflix.spinnaker.gate.controllers.PipelineController.bulksavePipeline(Body pipeline..)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we need a dedicated endpoint for this ... can it not just be submitted asynchronously as a task a polled.

We should avoid the taskService.createAndWaitForCompletion() wheverer possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We kept the dedicate endpoint for a rest call. Was able to submit 3000+ pipelines thru json payload and got the response time ~2 mins. Hence, made the call as synchronous. It can be called from deck as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding only 2 rest endpoints. One in gate and the other in front50. Not adding any rest endpoint in orca.

Copy link
Contributor

@ajordens ajordens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there are SpEL expressions? I believe the current SavePipelineTask deals with this by having the pipeline contents be Base64 encoded.

3.3 The QueueExecutionRunner will call the StartExecutionHandler.kt.
3.4 The StartExecutionHandler.kt will call the SavePipelineTask.java.
3.5 SavePipelineTask will call the front 50 rest api.
Response response = front50Service.savePipelineList(List<pipeline>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

savePipelineList would be better as savePipelines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


5.1 The post request sent by the orca will hit the PipelineController Rest Controller endpoint.

com.netflix.spinnaker.front50.controllers.OperationsController.bulksave(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what OperationsController is, I can only assume it's actually going to call the PipelineController.batchUpdate() endpoint and we'll be handling whether it's admin vs. non-admin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caveat that the batchUpdate() endpoint wasn't expected to be exposed and lacks the validation provided by save().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be PipelineController and not OperationsController. Corrected the document too.

@ajordens
Copy link
Contributor

ajordens commented Dec 2, 2020

My earlier comment around if the intention for this is to batch update 10k+ pipelines, I don't think it's the right approach.

Copy link
Member

@robzienert robzienert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ajordens.

My current vote is -1

This proposal is written to support a use case which I think is misusing the system; a square peg through a round hole as mentioned before. It's unclear to me how this change would be beneficial to the community as a whole.

This seems like something I would rather have an RFC for adding extension points so you can do whatever you want. In which case, you already have support for creating a new bulk pipeline save stage in Orca, and then you'd just need some extension points in front50 to service this request.

This being said, I do think the underlying use case is approaching the problem incorrectly.

@sanopsmx
Copy link
Contributor Author

sanopsmx commented Dec 8, 2020

My earlier comment around if the intention for this is to batch update 10k+ pipelines, I don't think it's the right approach.

Are you trying to suggest multipart/form data files for batch update 10k+ pipelines.

@dilippai
Copy link

Re: use case fit: I don't believe it's unreasonable to expect a pipeline orchestration system to scale to support a large number of actual pipelines, which may be created relatively dynamically. I'm concerned about cases that we considered critical to our choice of Spinnaker as a pipeline orchestration system being considered odd or abnormal. I would expect some degree in flexibility in how users use the platform. I don't see bulk creation of pipelines as an unusual pattern for a business with a very large footprint.

@ajordens
Copy link
Contributor

Re: use case fit: I don't believe it's unreasonable to expect a pipeline orchestration system to scale to support a large number of actual pipelines, which may be created relatively dynamically. I'm concerned about cases that we considered critical to our choice of Spinnaker as a pipeline orchestration system being considered odd or abnormal. I would expect some degree in flexibility in how users use the platform. I don't see bulk creation of pipelines as an unusual pattern for a business with a very large footprint.

First and foremost, Spinnaker is a platform that provides a set of building blocks. In many cases, complete solutions can be built of these building blocks.

Investments have been made towards supporting extensibility for use-cases that at present time, may fall outside of what's natively provided. Speaking as a Netflix engineer, we've been using these points of extension for many years. As have other large enterprises in the community. They are intended specifically for situations like this where a targeted solution is needed quickly.

Ignoring the merits of this RFC for a moment, the proposed PRs backing this RFC need significant improvement before they could be considered mergeable.

@rebalag
Copy link

rebalag commented Jan 29, 2021

Supporting pipeline crud operations should be one of the core functions of Spinnaker as a platform. In many deployments, the pipelines are managed with templates and pipeline json structures are generated from these templates based on enterprise requirements and then saved to Spinnaker using the API. This operation occurs frequently enough that there is a requirement for efficiency in saving bulk pipelines in Spinnaker.

The changes to pipelines can occur in a Spinnaker deployments because of

  1. Changes in policies associated with pipelines
  2. Changes to metadata in the pipelines for better analytics of executed pipelines
  3. Synchronizing pipelines in Spinnaker with Git spanning multiple applications

Applying the changes to pipelines with one pipeline at a time with currently available API takes hours to complete. Exposing the batchupdate API currently available in Front50 can perform order of magnitude faster.

It seems to me that saving pipelines one at a time or in bulk should belong to Spinnaker core platform.

@ajordens
Copy link
Contributor

Let's step back a bit.

I'm not against the idea of a bulk api but the particular implementation details called out in this proposal were problematic.

Proposals that have accompanying implementation details are challenging in that the very literal implementation often distracts from the underlying idea.

In the future, a governance rfc like this would benefit from focus on the idea, motivations, alternatives, and any potential problems that might fall out, and less on the "it's going to change this file, etc.".

@sanopsmx
Copy link
Contributor Author

Supporting pipeline crud operations should be one of the core functions of Spinnaker as a platform. In many deployments, the pipelines are managed with templates and pipeline json structures are generated from these templates based on enterprise requirements and then saved to Spinnaker using the API. This operation occurs frequently enough that there is a requirement for efficiency in saving bulk pipelines in Spinnaker.

The changes to pipelines can occur in a Spinnaker deployments because of

  1. Changes in policies associated with pipelines
  2. Changes to metadata in the pipelines for better analytics of executed pipelines
  3. Synchronizing pipelines in Spinnaker with Git spanning multiple applications

Applying the changes to pipelines with one pipeline at a time with currently available API takes hours to complete. Exposing the batchupdate API currently available in Front50 can perform order of magnitude faster.

It seems to me that saving pipelines one at a time or in bulk should belong to Spinnaker core platform.

We do need a bulk save method of saving multiple pipelines in one go.

@sanopsmx
Copy link
Contributor Author

Let's step back a bit.

I'm not against the idea of a bulk api but the particular implementation details called out in this proposal were problematic.

Proposals that have accompanying implementation details are challenging in that the very literal implementation often distracts from the underlying idea.

In the future, a governance rfc like this would benefit from focus on the idea, motivations, alternatives, and any potential problems that might fall out, and less on the "it's going to change this file, etc.".

you were mentioning about extensions in your previous comments. Will adding plugin approach to save multiple pipelines work here?

@sanopsmx
Copy link
Contributor Author

sanopsmx commented Feb 3, 2021

Raised a new PR. Hence closing this PR.
#214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants