-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPTIONAL MATCH issues + Continued Development and Support \ Community for Morpheus (Spark 3 support, bugfixes ) #947
Comments
So , an update -> i was able to resolve this bug with a strange workaround -> the workaround is probably far from ideal, but it works -> basically this bug ( consecutive optional matches where the first doesn't match and the second does match but is not returned) is due to the fact that the join columns also join on the non matched label and properties (although they are null) which causes them not to return . I think solving it is a MUST in order to actually say optional match is supported in this project ( what's the point of having optional match support that works only if it matches ? ) therefore , my workaround was to drop all columns from the right hand side operator , which contain only null values -> and then when the common expressions are calculated , they do not include those columns and hence there is no join on a null value column . the better way was to deal just with the join columns , but this creates other issues . the main issue was that when joining , the auto calculation of the join expression is wrong , as it takes into account unrelated columns from the rhs and lhs and uses them as key , much more columns than really needed ( for example , having a schema (a:A), (b:B), (c:C) on the lhs , while there is an expansion on the rhs for (a)->(d:D) , instead of using just a and d as the join key , would also use b and c , which may appear null on the rhs and non null on the lhs ( and this is i believe due to the use of caching to the original + auto calculating the join expressions based on common headers) Now , wondering if anyone else is interested in pushing this project forward as a joint effort . currently i have 2 contributions :
once i'm done testing those , will push them either to a fork or here ( depending if i see it interests more people in the community) |
@MarcianoAvihay do you still have your branch adding spark 3 support around, and if so would you mind pushing it? I'd like to pick up where you left off :) |
@MarcianoAvihay One option to consider... Apache Age also converts cypher to relational (postgres) sql statements. They implement the optional match using a LEFT LATERAL JOIN which I think would address the issue you identified. apache/age#175 . Spark v3.2 now supports LATERAL joins apache/age#175. There isn't a Spark dataframe api method to accomplish LATERAL joins, so a temporary view would have to be created to support the join. |
@MarcianoAvihay I also found this problem during testing. Can you tell me a specific solution or show me your solution code? Thank you |
hi,
i have encountered what i believe is a bug in the optional match implementation , and hoping to get some guidance as to how i could help remediate it (and submit a pull request for all to have this fix) .
the issue is this ->
i have done multiple tests that all lead to the same result -> using OPTIONAL MATCH to match against a non-existent relationship expansion , and afterwards having another OPTIONAL MATCH from the same variable to something that does exist , would never return the latter , and will only return a NULL .
re- example:
test graph ->
val test = morpheus.cypher(
"""
CONSTRUCT
CREATE (p1:Person {name: "Alice"})
CREATE (p2:Person {name: "Bob"})
CREATE (p3:Person {name: "Eve"})
CREATE (p4:Person {name: "Paul"})
CREATE (p1)-[:KNOWS]->(p3)
CREATE (p1)-[:KNOWS2]->(p2)
CREATE (p1)-[:KNOWS3]->(p3)
CREATE (p1)-[:KNOWS4]->(p4)
CREATE (p1)-[:KNOWS5]->(p2)
return GRAPH
""".stripMargin
).graph
--- query -->
val testres = test.cypher(
"""
match (p1:Person )
optional match (p1)-[:KNOWS]->(p2)
optional match (p1)-[:KNOWS15]->(p3)
optional match (p1)-[:KNOWS4]->(p4)
return p1.name, p2.name,p3.name,p4.name
""".stripMargin)
--- yields the following result ->
+-------+-------+-------+-------+
|p1_name|p2_name|p3_name|p4_name|
+-------+-------+-------+-------+
| Bob| null| null| null|
| Eve| null| null| null|
| Paul| null| null| null|
| Alice| Eve| null| null|
+-------+-------+-------+-------+
*** But , if we would change the query order to (having the non existent expansion last):
val testres = test2.cypher(
"""
match (p1:Person )
optional match (p1)-[:KNOWS]->(p2)
optional match (p1)-[:KNOWS4]->(p4)
optional match (p1)-[:KNOWS15]->(p3)
return p1.name, p2.name,p3.name,p4.name
""".stripMargin)
-- we would then get the correct result :
+-------+-------+-------+-------+
|p1_name|p2_name|p3_name|p4_name|
+-------+-------+-------+-------+
| Alice| Eve| null| Paul|
| Bob| null| null| null|
| Eve| null| null| null|
| Paul| null| null| null|
-- which is not expected - since alice is also connected via KNOWS4 to paul.
same query and graph setup in neo4j yields (in any ordering of the query) :
i found in the OptionalMatchTests.scala in morpheus the following test which doesn't cover the above, and nothing else that does (there is no test in morpheus \ tck that cover doesnt exist - exist :
val g = initGraph(
"""
|CREATE (:DoesExist {property: 42})
|CREATE (:DoesExist {property: 43})
|CREATE (:DoesExist {property: 44})
""".stripMargin)
can any of the dev's please share they're thoughts as to how hard would it be to try and get to the expected behaviour? and where should i start looking in the solution to try and fix ?
is it a matter of a complex modification to the relation+logical planner to get the required behaviour ? if there is some small tweak that comes to mind it would greatly help me ( i am trying to use morpheus to automate a combination of left outer joins and inner joins needed in a very large datasets
thanks very much !
The text was updated successfully, but these errors were encountered: