Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Descriptive error message with exception when having invalid data in BULK INSERT #2610

Open
atul-delphix opened this issue Feb 19, 2025 · 1 comment
Assignees
Labels
Under Investigation Used for issues under investigation Waiting for Response Waiting for a reply from the original poster, or affiliated party

Comments

@atul-delphix
Copy link

atul-delphix commented Feb 19, 2025

Is your feature request related to a problem? If so, please give a short summary of the problem and how the feature would resolve it

We are consuming, mssql-jdbc driver version 12.4.2.jre11 with mssql-spark-connector version 3.1.

And while loading the data from CSV to target database with some invalid data (i.e length of data in csv file exceeds than column length in target database's table) then it failed with generic error message Bulk load data was expected but not sent. The batch will be terminated..

Through its is almost impossible to diagnose the RCA of failure, please refer the sample stacktrace in case of such failure:

08-08-2024 05:46:40.107 [task-result-getter-1] WARN  o.a.spark.scheduler.TaskSetManager.logWarning - Lost task 0.0 in stage 189.0 (TID 189) (af5307e43913 executor driver): com.microsoft.sqlserver.jdbc.SQLServerException: Bulk load data was expected but not sent. The batch will be terminated.
	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:261)
	at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:316)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:137)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:42)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:31)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:4533)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7748)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4410)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:4541)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(SQLServerConnection.java:4747)
	at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.savePartition(BulkCopyUtils.scala:68)
	at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2(BestEffortSingleInstanceStrategy.scala:43)
	at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2$adapted(BestEffortSingleInstanceStrategy.scala:42)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1009)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1009)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:139)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Describe the preferred solution

Error should properly point against which table's and column having issue with specific data. For example),

 String or binary data would be truncated in table 'SourceDB2019.dbo.test_table_target', column 'char_col_with_2_length'. Truncated value: 'lo'

Describe alternatives you've considered

Additional context

Reference Documentations/Specifications

Reference Implementation

@github-project-automation github-project-automation bot moved this to To be triaged in MSSQL JDBC Feb 19, 2025
@machavan machavan self-assigned this Feb 20, 2025
@machavan machavan added the Under Investigation Used for issues under investigation label Feb 20, 2025
@machavan
Copy link
Contributor

Hi @atul-delphix

When I tried to reproduce the issue with a code snippet as below

        String url = "jdbc:sqlserver://instance;database=db;encrypt=true;trustServerCertificate=true;username=user;password=password";

        Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");

        String csvFilePath = "C:\\Users\\Bob\\Downloads\\test.csv"; 
        String destinationTable = "dbo.VARCHAR_TABLE"; 

        int countBefore, countAfter;

        // Get data from the source file by loading it into a class that implements ISQLServerBulkRecord.
        // Here we are using the SQLServerBulkCSVFileRecord implementation to import the example CSV file.
        try (Connection destinationConnection = DriverManager.getConnection(url);
                Statement stmt = destinationConnection.createStatement();
                SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(destinationConnection);
                SQLServerBulkCSVFileRecord fileRecord = new SQLServerBulkCSVFileRecord(csvFilePath, true);) {
            
            fileRecord.setEscapeColumnDelimitersCSV(true);
            // Set the metadata for each column to be copied.
            fileRecord.addColumnMetadata(1, null, java.sql.Types.INTEGER, 0, 0);
            fileRecord.addColumnMetadata(2, null, java.sql.Types.VARCHAR, 10, 0);
            fileRecord.addColumnMetadata(3, null, java.sql.Types.VARCHAR, 10, 0);

            // Empty the destination table.
            stmt.executeUpdate("DELETE FROM " + destinationTable);

            // Perform an initial count on the destination table.
            countBefore = getRowCount(stmt, destinationTable);

            // Set up the bulk copy object.
            // Note that the column positions in the source
            // data reader match the column positions in
            // the destination table so there is no need to
            // map columns.
            bulkCopy.setDestinationTableName(destinationTable);

            // Write from the source to the destination.
            bulkCopy.writeToServer(fileRecord);

            // Perform a final count on the destination
            // table to see how many rows were added.
            countAfter = getRowCount(stmt, destinationTable);
            System.out.println((countAfter - countBefore) + " rows were added.");
        }
        // Handle any errors that may have occurred.
        catch (SQLException e) {
            e.printStackTrace();
        }    
   }

with csv contents like this

1,Hi,Hi
2,Hello,Hello
3,There,Thereeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

It throws an appropriate error message

com.microsoft.sqlserver.jdbc.SQLServerException: The given value of type VARCHAR(132) from the data source cannot be converted to type varchar(10) of the specified target column COL2.

Would you like to suggest changes to this repro so as to get an error that you encountered ?

@machavan machavan added the Waiting for Response Waiting for a reply from the original poster, or affiliated party label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Under Investigation Used for issues under investigation Waiting for Response Waiting for a reply from the original poster, or affiliated party
Projects
Status: To be triaged
Development

No branches or pull requests

2 participants