Design document to improve the sql storage performance by using bulk save api #194

sanopsmx · 2020-11-01T14:26:13Z

Improve the sql storage performance by using bulk save api


Status	Proposed, Accepted, Implemented, Obsolete
RFC #	spinnaker/gate#1385 , spinnaker/front50#988, spinnaker/orca#3987
Author(s)	Sanjeev Thatiparthi (@sanopsmx) (https://github.com/sanopsmx)
SIG / WG	sig-security

Overview

spinnaker/spinnaker#6147

This is valid only for sql storage.

When we have multiple pipelines to save , we need to call front50 save api serially one after the other.
This reduces the performance of front50 save api.

We improve the front50 save performance by using bulk insert query.
This will help save many pipelines at once which greatly improves the performance of front50.
We need to write a bulk save api which accepts an list of pipelines in a single rest api call
and save them in front50 using bulk insert query.

Goals and Non-Goals

We improve the front50 save performance by using bulk insert query.
This will help save many pipelines at once which greatly improves the performance of front50.
We need to write a bulk save api which accepts an list of pipelines in a single rest api call
and save them in front50 using bulk insert query.

Motivation and Rationale

To improve the performance of multiple saving of pipelines at once using bulk insert sql query.

Timeline

Already Implemented

Design

Deck calls the gate api:

1.1 url: http://:8084/pipelines/bulksave
method: POST
payload: [{"keepWaitingPipelines":false,"limitConcurrent":true,"application":"apptest","spelEvaluator":"v4",
"lastModifiedBy":"admin","name":"pipeline_test","stages":[{"refId":"1","requisiteStageRefIds":[],
"type":"wait","name":"Wait","waitTime":30}],"index":3,"id":"66a5c77b-3998-495d-ad61-117313c5800c",
"triggers":[],"updateTs":"1599652266000"},{"keepWaitingPipelines":false,"limitConcurrent":true,
"application":"apptest","spelEvaluator":"v4","lastModifiedBy":"admin","name":"pipeline_test1",
"stages":[{"refId":"1","requisiteStageRefIds":[],"type":"wait","name":"Wait","waitTime":30}],
"index":4,"id":"66a5c77b-3998-495d-ad61-117313c5800c","triggers":[],"updateTs":"1599652266000"}......]
Gate api flow:

2.1 The post request sent by the deck will hit the PipelineController Rest Controller endpoint.
```
 com.netflix.spinnaker.gate.controllers.PipelineController.bulksavePipeline(Body pipeline..)
```
2.2 TaskService will create a new task(synchronous job) for saving the pipeline.
taskService.createAndWaitForCompletion(operation)

2.3 The above new task will call the orca service post api to save the pipeline.
```
 orcaService.doOperation(@Body Map<String, ? extends Object> body)
 url: 		http://<ip address>:8083/ops
 method: 	POST
```
Orca api flow:

3.1 The post request sent by the gate will hit the OperationsController Rest Controller endpoint.
```
com.netflix.spinnaker.orca.controllers.OperationsController Map<String, String> ops(@RequestBody Map input) {
```
3.2 The above method will start a new task with QueueExecutionRunner.kt.
3.3 The QueueExecutionRunner will call the StartExecutionHandler.kt.
3.4 The StartExecutionHandler.kt will call the SavePipelineTask.java.
3.5 SavePipelineTask will call the front 50 rest api.
Response response = front50Service.savePipelineList(List);
url: http://:8080/pipelines/bulksave
method: POST
Fiat api flow:

4.1 Front 50 will check with fiat for permissions.
```
@PreAuthorize( "@fiatPermissionEvaluator.storeWholePermission() and
authorizationSupport.hasRunAsUserPermission(List<pipeline>)")
```
4.2 storeWholePermission() will check if the permissions object is not null..
4.3 hasPermission() will check whether that particular resource(APPLICATION) has ‘WRITE’ permission to save or not.
4.4 authorizationSupport.hasRunAsUserPermission() check happens at front50,
which calls fiat to check whether that particular resource(Application) has ‘EXECUTE’ permission or not.

Front50 api flow:

5.1 The post request sent by the orca will hit the PipelineController Rest Controller endpoint.

com.netflix.spinnaker.front50.controllers.PipelineController.bulksave(
  		@RequestBody List<Pipeline> pipelineList,
@RequestParam(value = "staleCheck", required = false, Boolean staleCheck) {

5.2 PipelineDAO will now save the pipeline list and return the response.

	pipelineDAO.bulkImport(savePipelineList);

Usage:

The bulk API is invoked using Gate. The invocation is:

curl -X POST -H "Content-type: application/json" -d '[{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_004",
"spelEvaluator": "v4",
"name": "pipe1001",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1000,
"triggers": []
},
{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_005",
"spelEvaluator": "v4",
"name": "pipe1002",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1001,
"triggers": []
}]' http://:8084/pipelines/bulksave

Output:

{
“Successful”: ,
“Failed”: ,
“Failed_list”: [<array of failed pipelines - (application, pipelinename, etc) and the error message]
}

Assumptions:

This solution works only for RDBMS database store.
Save pipelines run in the context of an application, irrespective of the content of the pipeline.
In the case of bulk save, we will use the application name as “bulk save”.
The implication is that the tasks for bulk save will not show up under tasks in their respective applications

Sample json payload:

An array of valid Spinnaker pipeline jsons.

Example:
[
{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_004",
"spelEvaluator": "v4",
"name": "pipe1001",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1000,
"triggers": []
},
{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_005",
"spelEvaluator": "v4",
"name": "pipe1002",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1001,
"triggers": []
},
{
"keepWaitingPipelines": false,
"limitConcurrent": true,
"application": "test_006",
"spelEvaluator": "v4",
"name": "pipe1003",
"stages": [
{
"requisiteStageRefIds": [],
"name": "Wait",
"refId": "1",
"type": "wait",
"waitTime": 6
}
],
"index": 1002,
"triggers": []
}
]

…save api.

updated bulksave.md with the spinnaker issue no.

robzienert · 2020-11-02T23:24:10Z

I'm unclear about when this would be used. Are we changing Deck to save multiple pipelines, and if so, when does that occur?

emagana-zz · 2020-11-03T06:35:59Z

@robzienert Hi There! @sanopsmx, his team and salesforce are working together on this new feature. Basically, we templatize all our pipelines and we have macros that could apply to all our pipelines (over 100K). Therefore, we need to update them all. This is why we need a Bulk API to speed up the writing of the pipelines. If we do not have this feature, we need to update one pipeline at the time and it makes the process extremely slow, one update could take over 5 hours for those 100K pipelines. let me know if I can address more questions.

ajordens · 2020-11-03T06:47:05Z

I added a comment on one of the PRs (I believe) but there's some prior art in front50 around this already.

See the /batchUpdate endpoints (edit: I see that the proposal looks to use them).

On the other hand, I think with this intended volume of discrete pipelines you're trying to fit a square peg into a round hole.

What some folks have done on our side is build a pipeline generator outside of Spinnaker. The generator itself is parameterized and generates pipeline execution json that is fed into orca. Each permutation is not tracked as an individual pipeline config.

ajordens · 2020-12-02T17:22:24Z

rfc/bulksave.md

+
+   2.1	The post request sent by the deck will hit the PipelineController Rest Controller endpoint.
+
+        com.netflix.spinnaker.gate.controllers.PipelineController.bulksavePipeline(Body pipeline..)


Not sure why we need a dedicated endpoint for this ... can it not just be submitted asynchronously as a task a polled.

We should avoid the taskService.createAndWaitForCompletion() wheverer possible.

We kept the dedicate endpoint for a rest call. Was able to submit 3000+ pipelines thru json payload and got the response time ~2 mins. Hence, made the call as synchronous. It can be called from deck as well.

We are adding only 2 rest endpoints. One in gate and the other in front50. Not adding any rest endpoint in orca.

ajordens

What happens if there are SpEL expressions? I believe the current SavePipelineTask deals with this by having the pipeline contents be Base64 encoded.

ajordens · 2020-12-02T17:27:02Z

rfc/bulksave.md

+    3.3	The QueueExecutionRunner will call the StartExecutionHandler.kt.
+    3.4	The StartExecutionHandler.kt will call the SavePipelineTask.java.
+    3.5	SavePipelineTask will call the front 50 rest api.
+        Response response = front50Service.savePipelineList(List<pipeline>);


savePipelineList would be better as savePipelines

ajordens · 2020-12-02T17:28:50Z

rfc/bulksave.md

+
+    5.1	The post request sent by the orca will hit the PipelineController Rest Controller endpoint.
+
+        com.netflix.spinnaker.front50.controllers.OperationsController.bulksave(


I don't know what OperationsController is, I can only assume it's actually going to call the PipelineController.batchUpdate() endpoint and we'll be handling whether it's admin vs. non-admin?

Caveat that the batchUpdate() endpoint wasn't expected to be exposed and lacks the validation provided by save().

It should be PipelineController and not OperationsController. Corrected the document too.

ajordens · 2020-12-02T17:36:44Z

My earlier comment around if the intention for this is to batch update 10k+ pipelines, I don't think it's the right approach.

robzienert

Agree with @ajordens.

My current vote is -1

This proposal is written to support a use case which I think is misusing the system; a square peg through a round hole as mentioned before. It's unclear to me how this change would be beneficial to the community as a whole.

This seems like something I would rather have an RFC for adding extension points so you can do whatever you want. In which case, you already have support for creating a new bulk pipeline save stage in Orca, and then you'd just need some extension points in front50 to service this request.

This being said, I do think the underlying use case is approaching the problem incorrectly.

sanopsmx · 2020-12-08T11:03:18Z

My earlier comment around if the intention for this is to batch update 10k+ pipelines, I don't think it's the right approach.

Are you trying to suggest multipart/form data files for batch update 10k+ pipelines.

dilippai · 2020-12-10T19:33:08Z

Re: use case fit: I don't believe it's unreasonable to expect a pipeline orchestration system to scale to support a large number of actual pipelines, which may be created relatively dynamically. I'm concerned about cases that we considered critical to our choice of Spinnaker as a pipeline orchestration system being considered odd or abnormal. I would expect some degree in flexibility in how users use the platform. I don't see bulk creation of pipelines as an unusual pattern for a business with a very large footprint.

ajordens · 2020-12-10T20:16:19Z

Re: use case fit: I don't believe it's unreasonable to expect a pipeline orchestration system to scale to support a large number of actual pipelines, which may be created relatively dynamically. I'm concerned about cases that we considered critical to our choice of Spinnaker as a pipeline orchestration system being considered odd or abnormal. I would expect some degree in flexibility in how users use the platform. I don't see bulk creation of pipelines as an unusual pattern for a business with a very large footprint.

First and foremost, Spinnaker is a platform that provides a set of building blocks. In many cases, complete solutions can be built of these building blocks.

Investments have been made towards supporting extensibility for use-cases that at present time, may fall outside of what's natively provided. Speaking as a Netflix engineer, we've been using these points of extension for many years. As have other large enterprises in the community. They are intended specifically for situations like this where a targeted solution is needed quickly.

Ignoring the merits of this RFC for a moment, the proposed PRs backing this RFC need significant improvement before they could be considered mergeable.

rebalag · 2021-01-29T03:31:22Z

Supporting pipeline crud operations should be one of the core functions of Spinnaker as a platform. In many deployments, the pipelines are managed with templates and pipeline json structures are generated from these templates based on enterprise requirements and then saved to Spinnaker using the API. This operation occurs frequently enough that there is a requirement for efficiency in saving bulk pipelines in Spinnaker.

The changes to pipelines can occur in a Spinnaker deployments because of

Changes in policies associated with pipelines
Changes to metadata in the pipelines for better analytics of executed pipelines
Synchronizing pipelines in Spinnaker with Git spanning multiple applications

Applying the changes to pipelines with one pipeline at a time with currently available API takes hours to complete. Exposing the batchupdate API currently available in Front50 can perform order of magnitude faster.

It seems to me that saving pipelines one at a time or in bulk should belong to Spinnaker core platform.

ajordens · 2021-01-29T17:20:58Z

Let's step back a bit.

I'm not against the idea of a bulk api but the particular implementation details called out in this proposal were problematic.

Proposals that have accompanying implementation details are challenging in that the very literal implementation often distracts from the underlying idea.

In the future, a governance rfc like this would benefit from focus on the idea, motivations, alternatives, and any potential problems that might fall out, and less on the "it's going to change this file, etc.".

sanopsmx · 2021-01-30T06:28:02Z

Supporting pipeline crud operations should be one of the core functions of Spinnaker as a platform. In many deployments, the pipelines are managed with templates and pipeline json structures are generated from these templates based on enterprise requirements and then saved to Spinnaker using the API. This operation occurs frequently enough that there is a requirement for efficiency in saving bulk pipelines in Spinnaker.

The changes to pipelines can occur in a Spinnaker deployments because of

Changes in policies associated with pipelines

Changes to metadata in the pipelines for better analytics of executed pipelines

Synchronizing pipelines in Spinnaker with Git spanning multiple applications

Applying the changes to pipelines with one pipeline at a time with currently available API takes hours to complete. Exposing the batchupdate API currently available in Front50 can perform order of magnitude faster.

It seems to me that saving pipelines one at a time or in bulk should belong to Spinnaker core platform.

We do need a bulk save method of saving multiple pipelines in one go.

sanopsmx · 2021-01-30T06:29:41Z

Let's step back a bit.

I'm not against the idea of a bulk api but the particular implementation details called out in this proposal were problematic.

Proposals that have accompanying implementation details are challenging in that the very literal implementation often distracts from the underlying idea.

In the future, a governance rfc like this would benefit from focus on the idea, motivations, alternatives, and any potential problems that might fall out, and less on the "it's going to change this file, etc.".

you were mentioning about extensions in your previous comments. Will adding plugin approach to save multiple pipelines work here?

sanopsmx · 2021-02-03T16:29:38Z

Raised a new PR. Hence closing this PR.
#214

Design document to improve the sql storage performance by using bulk …

8ceff88

…save api.

sanopsmx requested review from ajordens, ethanfrogers, ezimanyi, plumpy and robzienert as code owners November 1, 2020 14:26

updated bulksave.md with the spinnaker issue no.

d990153

updated bulksave.md with the spinnaker issue no.

ajordens reviewed Dec 2, 2020

View reviewed changes

robzienert requested changes Dec 2, 2020

View reviewed changes

sanopsmx closed this Feb 3, 2021

sanopsmx mentioned this pull request Feb 3, 2021

Bulk save api to improve the performance of spinnaker #214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design document to improve the sql storage performance by using bulk save api #194

Design document to improve the sql storage performance by using bulk save api #194

sanopsmx commented Nov 1, 2020 •

edited

Loading

robzienert commented Nov 2, 2020

emagana-zz commented Nov 3, 2020

ajordens commented Nov 3, 2020 •

edited

Loading

ajordens Dec 2, 2020

sanopsmx Dec 8, 2020

sanopsmx Dec 10, 2020

ajordens left a comment

ajordens Dec 2, 2020

sanopsmx Dec 8, 2020

ajordens Dec 2, 2020

ajordens Dec 2, 2020

sanopsmx Dec 8, 2020

ajordens commented Dec 2, 2020

robzienert left a comment

sanopsmx commented Dec 8, 2020

dilippai commented Dec 10, 2020

ajordens commented Dec 10, 2020

rebalag commented Jan 29, 2021

ajordens commented Jan 29, 2021

sanopsmx commented Jan 30, 2021

sanopsmx commented Jan 30, 2021

sanopsmx commented Feb 3, 2021


		2.1 The post request sent by the deck will hit the PipelineController Rest Controller endpoint.

		com.netflix.spinnaker.gate.controllers.PipelineController.bulksavePipeline(Body pipeline..)


		5.1 The post request sent by the orca will hit the PipelineController Rest Controller endpoint.

		com.netflix.spinnaker.front50.controllers.OperationsController.bulksave(

Design document to improve the sql storage performance by using bulk save api #194

Design document to improve the sql storage performance by using bulk save api #194

Conversation

sanopsmx commented Nov 1, 2020 • edited Loading

Improve the sql storage performance by using bulk save api

Overview

Goals and Non-Goals

Motivation and Rationale

Timeline

Design

Usage:

Output:

Assumptions:

Sample json payload:

robzienert commented Nov 2, 2020

emagana-zz commented Nov 3, 2020

ajordens commented Nov 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajordens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajordens commented Dec 2, 2020

robzienert left a comment

Choose a reason for hiding this comment

sanopsmx commented Dec 8, 2020

dilippai commented Dec 10, 2020

ajordens commented Dec 10, 2020

rebalag commented Jan 29, 2021

ajordens commented Jan 29, 2021

sanopsmx commented Jan 30, 2021

sanopsmx commented Jan 30, 2021

sanopsmx commented Feb 3, 2021

sanopsmx commented Nov 1, 2020 •

edited

Loading

ajordens commented Nov 3, 2020 •

edited

Loading