{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1726789041.0","currentOid":""},"activityList":{"items":[{"before":"1142417f6a9f4cd646c880f099ce2e6e61225e0c","after":null,"ref":"refs/heads/dependabot/maven/com.google.protobuf-protobuf-java-3.25.5","pushedAt":"2024-09-19T23:37:21.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"}},{"before":"6d1815eceea2003de2e3602f0f64e8188e8288d8","after":"04455797bfb3631b13b41cfa5d2604db3bf8acc2","ref":"refs/heads/master","pushedAt":"2024-09-19T19:32:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49720][PYTHON][INFRA] Add a script to clean up PySpark temp files\n\n### What changes were proposed in this pull request?\nAdd a script to clean up PySpark temp files\n\n### Why are the changes needed?\nSometimes I encounter weird issues due to the out-dated `pyspark.zip` file, and removing it can result in expected behavior.\nSo I think we can add such a script.\n\n### Does this PR introduce _any_ user-facing change?\nno, dev-only\n\n### How was this patch tested?\nmanually test\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48167 from zhengruifeng/py_infra_cleanup.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Dongjoon Hyun <dongjoon@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49720\">SPARK-49720</a>][PYTHON][INFRA] Add a script to clean up PySpark temp files"}},{"before":"92cad2abd54e775259dc36d2f90242460d72a174","after":"6d1815eceea2003de2e3602f0f64e8188e8288d8","ref":"refs/heads/master","pushedAt":"2024-09-19T19:31:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49718][PS] Switch `Scatter` plot to sampled data\n\n### What changes were proposed in this pull request?\nSwitch `Scatter` plot to sampled data\n\n### Why are the changes needed?\nwhen the data distribution has relationship with the order, the first n rows will not be representative of the whole dataset\n\nfor example:\n```\nimport pandas as pd\nimport numpy as np\nimport pyspark.pandas as ps\n\n# ps.set_option(\"plotting.max_rows\", 10000)\nnp.random.seed(123)\n\npdf = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD')).sort_values(\"A\")\npsdf = ps.DataFrame(pdf)\n\npsdf.plot.scatter(x='B', y='A')\n```\n\nall 10k datapoints:\n![image](https://github.com/user-attachments/assets/72cf7e97-ad10-41e0-a8a6-351747d5285f)\n\nbefore (first 1k datapoints):\n![image](https://github.com/user-attachments/assets/1ed50d2c-7772-4579-a84c-6062542d9367)\n\nafter (sampled 1k datapoints):\n![image](https://github.com/user-attachments/assets/6c684cba-4119-4c38-8228-2bedcdeb9e59)\n\n### Does this PR introduce _any_ user-facing change?\nyes\n\n### How was this patch tested?\nci and manually test\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48164 from zhengruifeng/ps_scatter_sampling.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Dongjoon Hyun <dongjoon@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49718\">SPARK-49718</a>][PS] Switch <code>Scatter</code> plot to sampled data"}},{"before":"2c06ef1a49d8c81bdc1b880d7b0e8319186c2004","after":"cb89d18a4d750fc88e5d747601352488223e97b5","ref":"refs/heads/branch-3.4","pushedAt":"2024-09-19T19:19:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-46535][SQL][3.4] Fix NPE when describe extended a column without col stats\n\n### What changes were proposed in this pull request?\n\nBackport  [#44524 ] to 3.4 for [[SPARK-46535]](https://issues.apache.org/jira/browse/SPARK-46535)[SQL] Fix NPE when describe extended a column without col stats\n\n### Why are the changes needed?\n\nCurrently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.\n\n```\nCannot invoke \"org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()\" because the return value of \"scala.Option.get()\" is null\njava.lang.NullPointerException: Cannot invoke \"org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()\" because the return value of \"scala.Option.get()\" is null\n\tat org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)\n\tat org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)\n\tat org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)\n\tat org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)\n\tat org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)\n\tat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)\n```\n\n### Does this PR introduce _any_ user-facing change?\n\n### How was this patch tested?\n\nAdd a new test describe extended (formatted) a column without col stats\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #48160 from saitharun15/SPARK-46535-branch-3.4.\n\nLead-authored-by: saitharun15 <saitharun654@gmail.com>\nCo-authored-by: Sai Tharun <psaitharun@ibm.com>\nSigned-off-by: Dongjoon Hyun <dongjoon@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-46535\">SPARK-46535</a>][SQL][3.4] Fix NPE when describe extended a column witho…"}},{"before":"373928082d01850abf6f503f7dec7ecaa6845ade","after":null,"ref":"refs/heads/dependabot/bundler/docs/google-protobuf-3.25.5","pushedAt":"2024-09-19T17:21:51.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"}},{"before":"f0fb0c89ec29b587569d68a824c4ce7543721c06","after":"92cad2abd54e775259dc36d2f90242460d72a174","ref":"refs/heads/master","pushedAt":"2024-09-19T17:09:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49716][PS][DOCS][TESTS] Fix documentation and add test of barh plot\n\n### What changes were proposed in this pull request?\n- Update the documentation for barh plot to clarify the difference between axis interpretation in Plotly and Matplotlib.\n- Test multiple columns as value axis.\n\nThe parameter difference is demonstrated as below.\n```py\n>>> df = ps.DataFrame({'lab': ['A', 'B', 'C'], 'val': [10, 30, 20]})\n>>> df.plot.barh(x='val', y='lab').show()  # plot1\n\n>>> ps.set_option('plotting.backend', 'matplotlib')\n>>> import matplotlib.pyplot as plt\n>>> df.plot.barh(x='lab', y='val')\n>>> plt.show()  # plot2\n```\n\nplot1\n![newplot (5)](https://github.com/user-attachments/assets/f1b6fabe-9509-41bb-8cfb-0733f65f1643)\n\nplot2\n![Figure_1](https://github.com/user-attachments/assets/10e1b65f-6116-4490-9956-29e1fbf0c053)\n\n### Why are the changes needed?\nThe barh plot’s x and y axis behavior differs between Plotly and Matplotlib, which may confuse users. The updated documentation and tests help ensure clarity and prevent misinterpretation.\n\n### Does this PR introduce _any_ user-facing change?\nNo. Doc change only.\n\n### How was this patch tested?\nUnit tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48161 from xinrong-meng/ps_barh.\n\nAuthored-by: Xinrong Meng <xinrong@apache.org>\nSigned-off-by: Dongjoon Hyun <dongjoon@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49716\">SPARK-49716</a>][PS][DOCS][TESTS] Fix documentation and add test of barh…"}},{"before":"94dca78c128ff3d1571326629b4100ee092afb54","after":"f0fb0c89ec29b587569d68a824c4ce7543721c06","ref":"refs/heads/master","pushedAt":"2024-09-19T17:06:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49719][SQL] Make `UUID` and `SHUFFLE` accept integer `seed`\n\n### What changes were proposed in this pull request?\nMake `UUID` and `SHUFFLE` accept integer `seed`\n\n### Why are the changes needed?\nIn most cases, `seed` accept both int and long, but `UUID` and `SHUFFLE` only accept long seed\n\n```py\nIn [1]: spark.sql(\"SELECT RAND(1L), RAND(1), SHUFFLE(array(1, 20, 3, 5), 1L), UUID(1L)\").show()\n+------------------+------------------+---------------------------+--------------------+\n|           rand(1)|           rand(1)|shuffle(array(1, 20, 3, 5))|              uuid()|\n+------------------+------------------+---------------------------+--------------------+\n|0.6363787615254752|0.6363787615254752|              [20, 1, 3, 5]|1ced31d7-59ef-4bb...|\n+------------------+------------------+---------------------------+--------------------+\n\nIn [2]: spark.sql(\"SELECT UUID(1)\").show()\n...\nAnalysisException: [INVALID_PARAMETER_VALUE.LONG] The value of parameter(s) `seed` in `UUID` is invalid: expects a long literal, but got \"1\". SQLSTATE: 22023; line 1 pos 7\n...\n\nIn [3]: spark.sql(\"SELECT SHUFFLE(array(1, 20, 3, 5), 1)\").show()\n...\nAnalysisException: [INVALID_PARAMETER_VALUE.LONG] The value of parameter(s) `seed` in `shuffle` is invalid: expects a long literal, but got \"1\". SQLSTATE: 22023; line 1 pos 7\n...\n```\n\n### Does this PR introduce _any_ user-facing change?\nyes\n\nafter this fix:\n```py\nIn [2]: spark.sql(\"SELECT SHUFFLE(array(1, 20, 3, 5), 1L), SHUFFLE(array(1, 20, 3, 5), 1), UUID(1L), UUID(1)\").show()\n+---------------------------+---------------------------+--------------------+--------------------+\n|shuffle(array(1, 20, 3, 5))|shuffle(array(1, 20, 3, 5))|              uuid()|              uuid()|\n+---------------------------+---------------------------+--------------------+--------------------+\n|              [20, 1, 3, 5]|              [20, 1, 3, 5]|1ced31d7-59ef-4bb...|1ced31d7-59ef-4bb...|\n+---------------------------+---------------------------+--------------------+--------------------+\n```\n\n### How was this patch tested?\nadded tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48166 from zhengruifeng/int_seed.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Dongjoon Hyun <dongjoon@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49719\">SPARK-49719</a>][SQL] Make <code>UUID</code> and <code>SHUFFLE</code> accept integer <code>seed</code>"}},{"before":null,"after":"1142417f6a9f4cd646c880f099ce2e6e61225e0c","ref":"refs/heads/dependabot/maven/com.google.protobuf-protobuf-java-3.25.5","pushedAt":"2024-09-19T16:27:23.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump com.google.protobuf:protobuf-java from 3.25.4 to 3.25.5\n\nBumps [com.google.protobuf:protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.25.4 to 3.25.5.\n- [Release notes](https://github.com/protocolbuffers/protobuf/releases)\n- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)\n- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.25.4...v3.25.5)\n\n---\nupdated-dependencies:\n- dependency-name: com.google.protobuf:protobuf-java\n  dependency-type: direct:production\n...\n\nSigned-off-by: dependabot[bot] <support@github.com>","shortMessageHtmlLink":"Bump com.google.protobuf:protobuf-java from 3.25.4 to 3.25.5"}},{"before":null,"after":"373928082d01850abf6f503f7dec7ecaa6845ade","ref":"refs/heads/dependabot/bundler/docs/google-protobuf-3.25.5","pushedAt":"2024-09-19T16:26:29.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump google-protobuf from 3.25.3 to 3.25.5 in /docs\n\nBumps [google-protobuf](https://github.com/protocolbuffers/protobuf) from 3.25.3 to 3.25.5.\n- [Release notes](https://github.com/protocolbuffers/protobuf/releases)\n- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)\n- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.25.3...v3.25.5)\n\n---\nupdated-dependencies:\n- dependency-name: google-protobuf\n  dependency-type: indirect\n...\n\nSigned-off-by: dependabot[bot] <support@github.com>","shortMessageHtmlLink":"Bump google-protobuf from 3.25.3 to 3.25.5 in /docs"}},{"before":"398457af59875120ea8b3ed44468a51597e6a441","after":"94dca78c128ff3d1571326629b4100ee092afb54","ref":"refs/heads/master","pushedAt":"2024-09-19T13:11:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `timedelta`\n\n### What changes were proposed in this pull request?\nRefine the string representation of `timedelta`, by following the ISO format.\nNote that the used units in JVM side (`Duration`) and Pandas are different.\n\n### Why are the changes needed?\nWe should not leak the raw data\n\n### Does this PR introduce _any_ user-facing change?\nyes\n\nPySpark Classic:\n```\nIn [1]: from pyspark.sql import functions as sf\n\nIn [2]: import datetime\n\nIn [3]: sf.lit(datetime.timedelta(1, 1))\nOut[3]: Column<'PT24H1S'>\n```\n\nPySpark Connect (before):\n```\nIn [1]: from pyspark.sql import functions as sf\n\nIn [2]: import datetime\n\nIn [3]: sf.lit(datetime.timedelta(1, 1))\nOut[3]: Column<'86401000000'>\n```\n\nPySpark Connect (after):\n```\nIn [1]: from pyspark.sql import functions as sf\n\nIn [2]: import datetime\n\nIn [3]: sf.lit(datetime.timedelta(1, 1))\nOut[3]: Column<'P1DT0H0M1S'>\n```\n\n### How was this patch tested?\nadded test\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48159 from zhengruifeng/pc_lit_delta.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49693\">SPARK-49693</a>][PYTHON][CONNECT] Refine the string representation of `t…"}},{"before":"4068fbcc0de59154db9bdeb1296bd24059db9f42","after":"398457af59875120ea8b3ed44468a51597e6a441","ref":"refs/heads/master","pushedAt":"2024-09-19T13:02:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api\n\n### What changes were proposed in this pull request?\nThis PR adds `Dataset.groupByKey(..)` to the shared interface. I forgot to add in the previous PR.\n\n### Why are the changes needed?\nThe shared interface needs to support all functionality.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48147 from hvanhovell/SPARK-49422-follow-up.\n\nAuthored-by: Herman van Hovell <herman@databricks.com>\nSigned-off-by: Herman van Hovell <herman@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49422\">SPARK-49422</a>][CONNECT][SQL] Add groupByKey to sql/api"}},{"before":"a060c236d314bd2facc73ad26926b59401e5f7aa","after":"4068fbcc0de59154db9bdeb1296bd24059db9f42","ref":"refs/heads/master","pushedAt":"2024-09-19T13:01:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-49717][SQL][TESTS] Function parity test ignore private[xxx] functions\n\n### What changes were proposed in this pull request?\nFunction parity test ignore private functions\n\n### Why are the changes needed?\nexisting test is based on `java.lang.reflect.Modifier` which cannot properly handle `private[xxx]`\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48163 from zhengruifeng/df_func_test.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49717\">SPARK-49717</a>][SQL][TESTS] Function parity test ignore private[xxx] fu…"}},{"before":"ac34f1de92c6f5cb53d799f00e550a0a204d9eb2","after":"a060c236d314bd2facc73ad26926b59401e5f7aa","ref":"refs/heads/master","pushedAt":"2024-09-19T12:25:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-49667][SQL] Disallowed CS_AI collators with expressions that use StringSearch\n\n### What changes were proposed in this pull request?\n\nIn this PR, I propose to disallow `CS_AI` collated strings in expressions that use `StringsSearch` in their implementation. These expressions are `trim`, `startswith`, `endswith`, `locate`, `instr`, `str_to_map`, `contains`, `replace`, `split_part` and `substring_index`.\n\nCurrently, these expressions support all possible collations, however, they do not work properly with `CS_AI` collators. This is because there is no support for `CS_AI` search in the ICU's `StringSearch` class which is used to implement these expressions. Therefore, the expressions are not behaving correctly when used with `CS_AI` collators (e.g. currently `startswith('hOtEl' collate unicode_ai, 'Hotel' collate unicode_ai)` returns `true`).\n\n### Why are the changes needed?\n\nProposed changes are necessary in order to achieve correct behavior of the expressions mentioned above.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nThis patch was tested by adding a test in the `CollationSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #48121 from vladanvasi-db/vladanvasi-db/cs-ai-collations-expressions-disablement.\n\nAuthored-by: Vladan Vasić <vladan.vasic@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49667\">SPARK-49667</a>][SQL] Disallowed CS_AI collators with expressions that u…"}},{"before":"492d1b14c0d19fa89b9ce9c0e48fc0e4c120b70c","after":"ac34f1de92c6f5cb53d799f00e550a0a204d9eb2","ref":"refs/heads/master","pushedAt":"2024-09-19T09:56:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48280][SQL][FOLLOW-UP] Add expressions that are built via expressionBuilder to Expression Walker\n\n### What changes were proposed in this pull request?\nAddition of new expressions to expression walker. This PR also improves descriptions of methods in the Suite.\n\n### Why are the changes needed?\nIt was noticed while debugging that startsWith, endsWith and contains are not tested with this suite and these expressions represent core of collation testing.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nTest only.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48162 from mihailom-db/expressionwalkerfollowup.\n\nAuthored-by: Mihailo Milosevic <mihailo.milosevic@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48280\">SPARK-48280</a>][SQL][FOLLOW-UP] Add expressions that are built via expr…"}},{"before":"3bdf146bbee58d207afaadc92024d9f6c4b941dd","after":"492d1b14c0d19fa89b9ce9c0e48fc0e4c120b70c","ref":"refs/heads/master","pushedAt":"2024-09-19T09:09:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48782][SQL] Add support for executing procedures in catalogs\n\n### What changes were proposed in this pull request?\n\nThis PR adds support for executing procedures in catalogs.\n\n### Why are the changes needed?\n\nThese changes are needed per [discussed and voted](https://lists.apache.org/thread/w586jr53fxwk4pt9m94b413xyjr1v25m) SPIP tracked in [SPARK-44167](https://issues.apache.org/jira/browse/SPARK-44167).\n\n### Does this PR introduce _any_ user-facing change?\n\nYes. This PR adds CALL commands.\n\n### How was this patch tested?\n\nThis PR comes with tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47943 from aokolnychyi/spark-48782.\n\nAuthored-by: Anton Okolnychyi <aokolnychyi@apache.org>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48782\">SPARK-48782</a>][SQL] Add support for executing procedures in catalogs"}},{"before":"f3c8d26eb0c3fd7f77950eb08c70bb2a9ab6493c","after":"3bdf146bbee58d207afaadc92024d9f6c4b941dd","ref":"refs/heads/master","pushedAt":"2024-09-19T07:27:42.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"MaxGekk","name":"Maxim Gekk","path":"/MaxGekk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1580697?s=80&v=4"},"commit":{"message":"[SPARK-49611][SQL][FOLLOW-UP] Fix wrong results of collations() TVF\n\n### What changes were proposed in this pull request?\nFix of accent sensitive and case sensitive column results.\n\n### Why are the changes needed?\nWhen initial PR was introduced, ICU collation listing ended up with different order of generating columns so results were wrong.\n\n### Does this PR introduce _any_ user-facing change?\nNo, as spark 4.0 was not released yet.\n\n### How was this patch tested?\nExisting test in CollationSuite.scala, which was wrong in the first place.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48152 from mihailom-db/tvf-collations-followup.\n\nAuthored-by: Mihailo Milosevic <mihailo.milosevic@databricks.com>\nSigned-off-by: Max Gekk <max.gekk@gmail.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49611\">SPARK-49611</a>][SQL][FOLLOW-UP] Fix wrong results of collations() TVF"}},{"before":"8861f0f9af3f397921ba1204cf4f76f4e20680bb","after":"f3c8d26eb0c3fd7f77950eb08c70bb2a9ab6493c","ref":"refs/heads/master","pushedAt":"2024-09-19T01:36:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api\"\n\nThis reverts commit af45902d33c4d8e38a6427ac1d0c46fe057bb45a.","shortMessageHtmlLink":"Revert \"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49422\">SPARK-49422</a>][CONNECT][SQL] Add groupByKey to sql/api\""}},{"before":"af45902d33c4d8e38a6427ac1d0c46fe057bb45a","after":"8861f0f9af3f397921ba1204cf4f76f4e20680bb","ref":"refs/heads/master","pushedAt":"2024-09-19T00:16:40.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-49495][DOCS] Document and Feature Preview on the master branch via Live GitHub Pages Updates\"\n\nThis reverts commit b1807095bef9c6d98e60bdc2669c8af93bc68ad4.","shortMessageHtmlLink":"Revert \"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49495\">SPARK-49495</a>][DOCS] Document and Feature Preview on the maste…"}},{"before":"3b34891e5b9c2694b7ffdc265290e25847dc3437","after":"af45902d33c4d8e38a6427ac1d0c46fe057bb45a","ref":"refs/heads/master","pushedAt":"2024-09-19T00:11:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api\n\n### What changes were proposed in this pull request?\nThis PR adds `Dataset.groupByKey(..)` to the shared interface. I forgot to add in the previous PR.\n\n### Why are the changes needed?\nThe shared interface needs to support all functionality.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48147 from hvanhovell/SPARK-49422-follow-up.\n\nAuthored-by: Herman van Hovell <herman@databricks.com>\nSigned-off-by: Herman van Hovell <herman@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49422\">SPARK-49422</a>][CONNECT][SQL] Add groupByKey to sql/api"}},{"before":"db8010b4c8be6f1c50f35cbde3efa44cd5d45adf","after":"3b34891e5b9c2694b7ffdc265290e25847dc3437","ref":"refs/heads/master","pushedAt":"2024-09-19T00:10:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49684][CONNECT] Remove global locks from session and execution managers\n\n### What changes were proposed in this pull request?\n\nEliminate the use of global locks in the session and execution managers. Those locks residing in the streaming query manager cannot be easily removed because the tag and query maps seemingly need to be synchronised.\n\n### Why are the changes needed?\n\nIn order to achieve true scalability.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #48131 from changgyoopark-db/SPARK-49684.\n\nAuthored-by: Changgyoo Park <changgyoo.park@databricks.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49684\">SPARK-49684</a>][CONNECT] Remove global locks from session and execution…"}},{"before":"5c48806a2941070e23a81b4e7e4f3225fe341535","after":"db8010b4c8be6f1c50f35cbde3efa44cd5d45adf","ref":"refs/heads/master","pushedAt":"2024-09-19T00:10:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49568][CONNECT][SQL] Remove self type from Dataset\n\n### What changes were proposed in this pull request?\nThis PR removes the self type parameter from Dataset. This turned out to be a bit noisy. The self type is replaced by a combination of covariant return types and abstract types. Abstract types are used when a method takes a Dataset (or a KeyValueGroupedDataset) as an argument.\n\n### Why are the changes needed?\nThe self type made using the classes in sql/api a bit noisy.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48146 from hvanhovell/SPARK-49568.\n\nAuthored-by: Herman van Hovell <herman@databricks.com>\nSigned-off-by: Herman van Hovell <herman@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49568\">SPARK-49568</a>][CONNECT][SQL] Remove self type from Dataset"}},{"before":"669e63a34012404d8d864cd6294f799b672f6f9a","after":"5c48806a2941070e23a81b4e7e4f3225fe341535","ref":"refs/heads/master","pushedAt":"2024-09-19T00:09:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49688][CONNECT][TESTS] Fix a sporadic `SparkConnectServiceSuite` failure\n\n### What changes were proposed in this pull request?\n\nAdd a short wait loop to ensure that the test pre-condition is met. To be specific, VerifyEvents.executeHolder is set asynchronously by MockSparkListener.onOtherEvent whereas the test assumes that VerifyEvents.executeHolder is always available.\n\n### Why are the changes needed?\n\nFor smoother development experience.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nSparkConnectServiceSuite.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #48142 from changgyoopark-db/SPARK-49688.\n\nAuthored-by: Changgyoo Park <changgyoo.park@databricks.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49688\">SPARK-49688</a>][CONNECT][TESTS] Fix a sporadic `SparkConnectServiceSuit…"}},{"before":"25d6b7a280f690c1a467f65143115cce846a732a","after":"669e63a34012404d8d864cd6294f799b672f6f9a","ref":"refs/heads/master","pushedAt":"2024-09-18T23:54:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49673][CONNECT] Increase CONNECT_GRPC_ARROW_MAX_BATCH_SIZE to 0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE\n\n### What changes were proposed in this pull request?\nIncreases the default `maxBatchSize` from 4MiB * 0.7 to 128MiB (=\nCONNECT_GRPC_MAX_MESSAGE_SIZE) * 0.7. This makes better use of the allowed maximum message size.\nThis limit is used when creating Arrow batches for the `SqlCommandResult` in the `SparkConnectPlanner` and for `ExecutePlanResponse.ArrowBatch` in `processAsArrowBatches`. This, for example, lets us return much larger `LocalRelations` in the `SqlCommandResult` (i.e., for the `SHOW PARTITIONS` command) while still staying within the GRPC message size limit.\n\n### Why are the changes needed?\nThere are `SqlCommandResults` that exceed 0.7 * 4MiB.\n\n### Does this PR introduce _any_ user-facing change?\nNow support `SqlCommandResults` <= 0.7 * 128 MiB instead of only <= 0.7 * 4MiB and ExecutePlanResponses will now better use the limit of 128MiB.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48122 from dillitz/increase-sql-command-batch-size.\n\nAuthored-by: Robert Dillitz <robert.dillitz@databricks.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49673\">SPARK-49673</a>][CONNECT] Increase CONNECT_GRPC_ARROW_MAX_BATCH_SIZE to …"}},{"before":"a6f6e07b70311fb843670b89f6546ae675359feb","after":"25d6b7a280f690c1a467f65143115cce846a732a","ref":"refs/heads/master","pushedAt":"2024-09-18T23:46:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-49692][PYTHON][CONNECT] Refine the string representation of literal date and datetime\n\n### What changes were proposed in this pull request?\nRefine the string representation of literal date and datetime\n\n### Why are the changes needed?\n1, we should not represent those literals with internal values;\n2, the string representation should be consistent with PySpark Classic if possible (we cannot make sure the representations are always the same because we only hold an unresolved expression in connect, but we can try our best to do so)\n\n### Does this PR introduce _any_ user-facing change?\nyes\n\nbefore:\n```\nIn [3]: lit(datetime.date(2024, 7, 10))\nOut[3]: Column<'19914'>\n\nIn [4]: lit(datetime.datetime(2024, 7, 10, 1, 2, 3, 456))\nOut[4]: Column<'1720544523000456'>\n```\n\nafter:\n```\nIn [3]: lit(datetime.date(2024, 7, 10))\nOut[3]: Column<'2024-07-10'>\n\nIn [4]: lit(datetime.datetime(2024, 7, 10, 1, 2, 3, 456))\nOut[4]: Column<'2024-07-10 01:02:03.000456'>\n```\n\n### How was this patch tested?\nadded tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48137 from zhengruifeng/py_connect_lit_dt.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49692\">SPARK-49692</a>][PYTHON][CONNECT] Refine the string representation of li…"}},{"before":"fbf81ebaef49baa4c19a936fb3884c2e62e6a49b","after":"a6f6e07b70311fb843670b89f6546ae675359feb","ref":"refs/heads/master","pushedAt":"2024-09-18T22:45:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48939][AVRO] Support reading Avro with recursive schema reference\n\nContinue the discussion from https://github.com/apache/spark/pull/47425 to this PR because I can't push to Yuchen's account\n\n###  What changes were proposed in this pull request?\nThe builtin ProtoBuf connector first supports recursive schema reference. It is approached by letting users specify an option “recursive.fields.max.depth”, and at the start of the execution, unroll the recursive field by this level. It converts a problem of dynamic schema for each row to a fixed schema which is supported by Spark. Avro can just adopt a similar method. This PR defines an option \"recursiveFieldMaxDepth\" to both Avro data source and from_avro function. With this option, Spark can support Avro recursive schema up to certain depth.\n\n### Why are the changes needed?\nRecursive reference denotes the case that the type of a field can be defined before in the parent nodes. A simple example is:\n```\n{\n  \"type\": \"record\",\n  \"name\": \"LongList\",\n  \"fields\" : [\n    {\"name\": \"value\", \"type\": \"long\"},\n    {\"name\": \"next\", \"type\": [\"null\", \"LongList\"]}\n  ]\n}\n```\nThis is written in Avro Schema DSL and represents a linked list data structure. Spark currently will throw an error on this schema. Many users used schema like this, so we should support it.\n\n### Does this PR introduce any user-facing change?\nYes. Previously, it will throw error on recursive schemas like above. With this change, it will still throw the same error by default but when users specify the option to a number greater than 0, the schema will be unrolled to that depth.\n\n### How was this patch tested?\nAdded new unit tests and integration tests to AvroSuite and AvroFunctionSuite.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCo-authored-by: Wei Liu <wei.liudatabricks.com>\n\nCloses #48043 from WweiL/yuchen-avro-recursive-schema.\n\nLead-authored-by: Yuchen Liu <yuchen.liu@databricks.com>\nCo-authored-by: Wei Liu <wei.liu@databricks.com>\nCo-authored-by: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com>\nSigned-off-by: Gengliang Wang <gengliang@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48939\">SPARK-48939</a>][AVRO] Support reading Avro with recursive schema reference"}},{"before":"ed3a9b1aa92957015592b399167a960b68b73beb","after":"fbf81ebaef49baa4c19a936fb3884c2e62e6a49b","ref":"refs/heads/master","pushedAt":"2024-09-18T20:06:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"MaxGekk","name":"Maxim Gekk","path":"/MaxGekk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1580697?s=80&v=4"},"commit":{"message":"[SPARK-47263][SQL] Assign names to the legacy conditions _LEGACY_ERROR_TEMP_13[44-46]\n\n### What changes were proposed in this pull request?\n\nrename err class _LEGACY_ERROR_TEMP_13[44-46]: 44 removed, 45 to DEFAULT_UNSUPPORTED, 46 to ADD_DEFAULT_UNSUPPORTED\n\n### Why are the changes needed?\n\nreplace legacy err class name with  semantically explicits.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nRe run the UT class modified in the PR (org.apache.spark.sql.sources.InsertSuite & org.apache.spark.sql.types.StructTypeSuite)\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46320 from PaysonXu/SPARK-47263.\n\nAuthored-by: xuping <13289341606@163.com>\nSigned-off-by: Max Gekk <max.gekk@gmail.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-47263\">SPARK-47263</a>][SQL] Assign names to the legacy conditions _LEGACY_ERRO…"}},{"before":"b86e5d2ab1fb17f8dcbb5b4d50f3361494270438","after":"ed3a9b1aa92957015592b399167a960b68b73beb","ref":"refs/heads/master","pushedAt":"2024-09-18T16:28:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49691][PYTHON][CONNECT] Function `substring` should accept column names\n\n### What changes were proposed in this pull request?\nFunction `substring` should accept column names\n\n### Why are the changes needed?\nBug fix:\n\n```\nIn [1]:     >>> import pyspark.sql.functions as sf\n   ...:     >>> df = spark.createDataFrame([('Spark', 2, 3)], ['s', 'p', 'l'])\n   ...:     >>> df.select('*', sf.substring('s', 'p', 'l')).show()\n```\n\nworks in PySpark Classic, but fail in Connect with:\n```\nNumberFormatException                     Traceback (most recent call last)\nCell In[2], line 1\n----> 1 df.select('*', sf.substring('s', 'p', 'l')).show()\n\nFile ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1170, in DataFrame.show(self, n, truncate, vertical)\n   1169 def show(self, n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False) -> None:\n-> 1170     print(self._show_string(n, truncate, vertical))\n\nFile ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:927, in DataFrame._show_string(self, n, truncate, vertical)\n    910     except ValueError:\n    911         raise PySparkTypeError(\n    912             errorClass=\"NOT_BOOL\",\n    913             messageParameters={\n   (...)\n    916             },\n    917         )\n    919 table, _ = DataFrame(\n    920     plan.ShowString(\n    921         child=self._plan,\n    922         num_rows=n,\n    923         truncate=_truncate,\n    924         vertical=vertical,\n    925     ),\n    926     session=self._session,\n--> 927 )._to_table()\n    928 return table[0][0].as_py()\n\nFile ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1844, in DataFrame._to_table(self)\n   1842 def _to_table(self) -> Tuple[\"pa.Table\", Optional[StructType]]:\n   1843     query = self._plan.to_proto(self._session.client)\n-> 1844     table, schema, self._execution_info = self._session.client.to_table(\n   1845         query, self._plan.observations\n   1846     )\n   1847     assert table is not None\n   1848     return (table, schema)\n\nFile ~/Dev/spark/python/pyspark/sql/connect/client/core.py:892, in SparkConnectClient.to_table(self, plan, observations)\n    890 req = self._execute_plan_request_with_metadata()\n    891 req.plan.CopyFrom(plan)\n--> 892 table, schema, metrics, observed_metrics, _ = self._execute_and_fetch(req, observations)\n    894 # Create a query execution object.\n    895 ei = ExecutionInfo(metrics, observed_metrics)\n\nFile ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1517, in SparkConnectClient._execute_and_fetch(self, req, observations, self_destruct)\n   1514 properties: Dict[str, Any] = {}\n   1516 with Progress(handlers=self._progress_handlers, operation_id=req.operation_id) as progress:\n-> 1517     for response in self._execute_and_fetch_as_iterator(\n   1518         req, observations, progress=progress\n   1519     ):\n   1520         if isinstance(response, StructType):\n   1521             schema = response\n\nFile ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1494, in SparkConnectClient._execute_and_fetch_as_iterator(self, req, observations, progress)\n   1492     raise kb\n   1493 except Exception as error:\n-> 1494     self._handle_error(error)\n\nFile ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1764, in SparkConnectClient._handle_error(self, error)\n   1762 self.thread_local.inside_error_handling = True\n   1763 if isinstance(error, grpc.RpcError):\n-> 1764     self._handle_rpc_error(error)\n   1765 elif isinstance(error, ValueError):\n   1766     if \"Cannot invoke RPC\" in str(error) and \"closed\" in str(error):\n\nFile ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1840, in SparkConnectClient._handle_rpc_error(self, rpc_error)\n   1837             if info.metadata[\"errorClass\"] == \"INVALID_HANDLE.SESSION_CHANGED\":\n   1838                 self._closed = True\n-> 1840             raise convert_exception(\n   1841                 info,\n   1842                 status.message,\n   1843                 self._fetch_enriched_error(info),\n   1844                 self._display_server_stack_trace(),\n   1845             ) from None\n   1847     raise SparkConnectGrpcException(status.message) from None\n   1848 else:\n\nNumberFormatException: [CAST_INVALID_INPUT] The value 'p' of the type \"STRING\" cannot be cast to \"INT\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018\n...\n```\n\n### Does this PR introduce _any_ user-facing change?\nyes, Function `substring` in Connect can properly handle column names\n\n### How was this patch tested?\nnew doctests\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #48135 from zhengruifeng/py_substring_fix.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Dongjoon Hyun <dongjoon@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49691\">SPARK-49691</a>][PYTHON][CONNECT] Function <code>substring</code> should accept col…"}},{"before":"7de71a2ec78d985c2a045f13c1275101b126cec4","after":"b86e5d2ab1fb17f8dcbb5b4d50f3361494270438","ref":"refs/heads/master","pushedAt":"2024-09-18T14:44:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49495][DOCS][FOLLOWUP] Enable GitHub Pages settings via .asf.yml\n\n### What changes were proposed in this pull request?\n\nA followup of SPARK-49495 to enable GitHub Pages settings via [.asf.yaml](https://cwiki.apache.org/confluence/pages/viewpage.action?spaceKey=INFRA&title=git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubPages)\n\n### Why are the changes needed?\nMeet the requirement for `actions/configure-pagesv5` action\n\n```\nRun actions/configure-pagesv5\n  with:\n    token: ***\n    enablement: false\n  env:\n    SPARK_TESTING: 1\n    RELEASE_VERSION: In-Progress\n    JAVA_HOME: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.1[2](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:2)-7/x64\n    JAVA_HOME_17_X64: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.12-7/x64\n    pythonLocation: /opt/hostedtoolcache/Python/[3](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:3).9.19/x64\n    PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.9.19/x6[4](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:4)/lib/pkgconfig\n    Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.9.19/x[6](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:6)4\n    Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.9.19/x64\n    Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.[9](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:9).19/x64\n    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.19/x64/lib\nError: Get Pages site failed. Please verify that the repository has Pages enabled and configured to build using GitHub Actions, or consider exploring the `enablement` parameter for this action. Error: Not Found - https://docs.github.com/rest/pages/pages#get-a-apiname-pages-site\nError: HttpError: Not Found - https://docs.github.com/rest/pages/pages#get-a-apiname-pages-site\n```\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nNA\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #48141 from yaooqinn/SPARK-49495-FF.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: Dongjoon Hyun <dongjoon@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49495\">SPARK-49495</a>][DOCS][FOLLOWUP] Enable GitHub Pages settings via .asf.yml"}},{"before":"4590538df095b20c0736ecc992ed9c0dfb926c0e","after":"7de71a2ec78d985c2a045f13c1275101b126cec4","ref":"refs/heads/master","pushedAt":"2024-09-18T05:54:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-49495][DOCS][FOLLOWUP] Fix Pandoc installation for GitHub Pages publication action\n\n### What changes were proposed in this pull request?\n\nAction 'pandoc/actions/setup' is now allowed by the ASF organization account. This followup makes the installation step manual.\n\n### Why are the changes needed?\n\nfix ci\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nhttps://github.com/yaooqinn/spark/actions/runs/10914663049/job/30293151174\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48136 from yaooqinn/SPARK-49495-F.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49495\">SPARK-49495</a>][DOCS][FOLLOWUP] Fix Pandoc installation for GitHub Page…"}},{"before":"fd8e99b9df55bf2ea29b6279a6a840ffef20ed4e","after":"4590538df095b20c0736ecc992ed9c0dfb926c0e","ref":"refs/heads/master","pushedAt":"2024-09-18T04:14:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49682][BUILD] Upgrade joda-time to 2.13.0\n\n### What changes were proposed in this pull request?\nThe pr aims to upgrade joda-time from `2.12.7` to `2.13.0`.\n\n### Why are the changes needed?\nThe version `DateTimeZone` data updated to version `2024bgtz`.\nThe full release notes: https://www.joda.org/joda-time/changes-report.html#a2.13.0\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48130 from panbingkun/SPARK-49682.\n\nAuthored-by: panbingkun <panbingkun@baidu.com>\nSigned-off-by: Dongjoon Hyun <dongjoon@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-49682\">SPARK-49682</a>][BUILD] Upgrade joda-time to 2.13.0"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEu2gC9AA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}