Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

Feature Request: Bulk insert / update of records #32

Open
3vivekb opened this issue Feb 2, 2016 · 15 comments
Open

Feature Request: Bulk insert / update of records #32

3vivekb opened this issue Feb 2, 2016 · 15 comments

Comments

@3vivekb
Copy link

3vivekb commented Feb 2, 2016

So right now within the R Code we can update one record at a time. But if we run a for loop and attempt to update many records, we can run into the 1000-15000 request limit. Also it's kinda slow.

Can somebody implement the bulk insert / update?

@StevenMMortimer
Copy link
Contributor

The Bulk API should have been included in the package starting at v0.8.0. If you type ?RForcecom and review the package index samples, you should see the following examples of how to use the Bulk API:

## BULK INSERT
 # create a sample data.frame of 1000 records
 n <- 1000
 data <- data.frame(Name=paste('New Record:', 1:n),
                    stringsAsFactors=FALSE)

 # run an insert job into the Account object
 job_info <- rforcecom.createBulkJob(session,
                                     operation='insert',
                                     object='Account')

 # split into batch sizes of 500 (2 batches for our 1000 row sample dataset)
 batches_info <- rforcecom.createBulkBatch(session,
                                           jobId=job_info$id,
                                           data,
                                           multiBatch = TRUE,
                                           batchSize=500)

 # check on status of each batch
 batches_status <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.checkBatchStatus(session,
                                                      jobId=x$jobId,
                                                      batchId=x$id)
                                                      })
 # get details on each batch
 batches_detail <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.getBatchDetails(session,
                                                     jobId=x$jobId,
                                                     batchId=x$id)
                                                     })
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)


 ## BULK DELETE THE PRIOR INSERT

 # format the data
 batch_details_together <- plyr::ldply(batches_detail)
 delete_ids <- data.frame(id=batch_details_together[,"Id"],
                          stringsAsFactors=FALSE)

 job_info <- rforcecom.createBulkJob(session, operation='delete', object='Account')
 batches_info <- rforcecom.createBulkBatch(session,
                                           jobId=job_info$id,
                                           data=delete_ids)
 # check on status of each batch
 batches_status <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.checkBatchStatus(session,
                                                      jobId=x$jobId,
                                                      batchId=x$id)
                                                      })
 # get details on each batch
 batches_detail <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.getBatchDetails(session,
                                                     jobId=x$jobId,
                                                     batchId=x$id)
                                                     })
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)


 ## BULK QUERY

 query <- "SELECT Id, Name FROM Account LIMIT 10"
 job_info <- rforcecom.createBulkJob(session, operation='query', object='Account')
 batch_query_info <- rforcecom.submitBulkQuery(session,
                                               jobId=job_info$id,
                                               query=query)

 batch_query_status <- rforcecom.checkBatchStatus(session,
                                                  jobId=batch_query_info$jobId,
                                                  batchId=batch_query_info$id)

 batch_query_details <- rforcecom.getBatchDetails(session,
                                                  jobId=batch_query_info$jobId,
                                                  batchId=batch_query_info$id)

 batch_query_recordset <- rforcecom.getBulkQueryResult(session,
                                                       jobId=batch_query_info$jobId,
                                                       batchId=batch_query_info$id,
                                                       resultId=batch_query_details$result)
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)


 ## BULK INSERT ATTACHMENTS

 # prepare your .zip file and request.txt manifest before calling these functions
 file <- 'request.zip'
 job_info <- rforcecom.createBulkJob(session, operation='insert', object='Attachment')
 batch_attachment_info <- rforcecom.insertBulkAttachments(session,
                                                          jobId=job_info$id,
                                                          file=file)
 batch_attachment_status <- rforcecom.checkBatchStatus(session,
                                                       jobId=batch_attachment_info$jobId,
                                                       batchId=batch_attachment_info$id)
 batch_attachment_details <- rforcecom.getBatchDetails(session,
                                                       jobId=batch_attachment_info$jobId,
                                                       batchId=batch_attachment_info$id)
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)


## End(Not run)

@ilcca
Copy link

ilcca commented Feb 11, 2016

How external ID should be set when creating bulk update/upsert?
I.e. upset requires externalID to be set, but doesn't allow to set it:

job_info <- rforcecom.createBulkJob(session, operation="upsert", object="Contact")
Error in rforcecom.createBulkJob(session, operation = "upsert", object = "Contact") :
InvalidJob: External ID was blank for Contact. An External ID must be specified for upsert.
job_info <- rforcecom.createBulkJob(session, operation="upsert", externalIdFieldName = "Member_ID__c", object="Contact")
Error in rforcecom.createBulkJob(session, operation = "upsert", externalIdFieldName = "Member_ID__c", : unused argument (externalIdFieldName = "Member_ID__c")

@StevenMMortimer
Copy link
Contributor

@ilcca Thanks for bringing up the issue. I've fixed and submitted a PR here: #35

The feature hasn't been merged yet, so if you really want to use it then you'll need to re-install RForcecom with the following command in R:
devtools::install_github('ReportMort/RForcecom', ref='metadata-api')

Here is an example of how to use:

job_info <- rforcecom.createBulkJob(session,  operation='upsert', 
                                    externalIdFieldName='My_External_Id__c', 
                                    object='Account')

my_data <- data.frame(My_External_Id__c=c('11111','22222', '99999'),  
                      Name=c('Updated_Name1', 'Updated_Name2', 'Upserted_Record'), 
                      stringsAsFactors=FALSE)

batches_info <- rforcecom.createBulkBatch(session, 
                                          jobId=job_info$id, 
                                          data=my_data)

# gather the batches result as you would with other operations (insert/update/delete)

@ilcca
Copy link

ilcca commented Feb 16, 2016

Steve,
thanks for fixing this so quick! I got your package and the upsert works well now.
However, to use update should there be external id applied as well? And what is the key id now when updating?

@StevenMMortimer
Copy link
Contributor

Updating does not require an external id. Simply include a column entitled "Id" in your data, set the bulk job to "update" and Salesforce will update the corresponding records based on that Id field

@ilcca
Copy link

ilcca commented Feb 23, 2016

So if I want to update rows at Contact, is the "Id" here the same one retrieved with SQL: "Select Id from Contact"? Hence to update rows I must first get the list of Ids to update. Am I right?

My idea was to use an External id to update rows at Contact.

@StevenMMortimer
Copy link
Contributor

Yes. "Id" here means the unique ID generated by Salesforce that corresponds to the record.

If you want to update based on External Id, then this might be a good strategy:

  1. Run SOQL: "Select Id, My_External_Id__c from Contact"
  2. In R, join your update data with this SOQL recordset on "My_External_Id__c", so that the "Id" column gets transferred into your update data.
  3. Ensure no duplicates got created during your join.
  4. Select the columns you want to send over to Salesforce (making sure you include the "Id" column)
  5. Create update bulk job and send over the data.frame.

@ilcca
Copy link

ilcca commented Feb 24, 2016

Thanks for the strategy, it makes sense.

However, when operating with bigger data then fetching all Ids from a Salesforce table can require lots of memory within R and traffic between Salesforce and R as well. That's why I'm wondering if there is a chance to make the sql query to include only needed Ids i.e. "Select Id, My_External_Id__c from Contact WHERE My_External_Id__c in LIST"
Here LIST means a list of External Id to be updated.

@Btibert3
Copy link

@ReportMort Looking over this thread, is it possible to do two things: A) update fields of information on a record and simultaneously B)attach a text file to that record?

@StevenMMortimer
Copy link
Contributor

@Btibert3 You'll have to do it in 2 passes, you cannot do it simultaneously because bulk jobs can only target one object at a time. Attachments must go to the Attachment object with a specially formatted zip file to upload and I think you're hinting that the records you want to update are in a different object (e.g. Account), so you'll have to create a second bulk job to do that.

@Btibert3
Copy link

@ReportMort Interesting, thanks for the quick reply. I am not as well versed in the API, but yes, we have a custom object that I want to add both facts to as well as attach the file. Because the record on the custom object would already exist prior to updating the fields and adding the file, does that matter. The part I am somewhat lost on is how I associate the attachment with the record of interest.

@StevenMMortimer
Copy link
Contributor

StevenMMortimer commented Jun 22, 2017

You'll need to create a zip file containing your attachments and a manifest file formatted as CSV that associates each document inside your zip file to a specific record id in Salesforce. You can see the example R code at the bottom of the package help, just type ?R`Forcecom-package` in your R console.

Here is the example code copy/pasted from there:

 ## BULK INSERT ATTACHMENTS
 
 # prepare your .zip file and request.txt manifest before calling these functions
 file <- 'request.zip'
 job_info <- rforcecom.createBulkJob(session, operation='insert', object='Attachment')
 batch_attachment_info <- rforcecom.insertBulkAttachments(session, 
                                                          jobId=job_info$id, 
                                                          file=file)
 batch_attachment_status <- rforcecom.checkBatchStatus(session, 
                                                       jobId=batch_attachment_info$jobId, 
                                                       batchId=batch_attachment_info$id)
 batch_attachment_details <- rforcecom.getBatchDetails(session, 
                                                       jobId=batch_attachment_info$jobId, 
                                                       batchId=batch_attachment_info$id)
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)

Here is the reference page on how to create your zip file of attachments with the appropriate manifest: https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/binary_intro.htm. Your zip file is kept locally and you should create a CSV request.txt file since that is the default format used by the RForcecom package when inserting into the Attachment object.

@Btibert3
Copy link

You are the man, thanks for your help! Ok, seeing that, how am I am able to assign that attachment to a specific record on my custom object?

@StevenMMortimer
Copy link
Contributor

StevenMMortimer commented Jun 22, 2017

Your manifest.txt file will contain the Ids. The Ids are always unique, even across objects, so don't worry, the attachments will end up going to the right records on your custom object as long as you've specified their individual Ids properly.

I suggest, going through and updating all the records first, that way you know the Ids and can put those in your manifest.txt file.

@Btibert3
Copy link

Ahhhh! Let me take a look. Much appreciated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants