Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: parser, binder, planner and interactive shell for bustub #257

Closed
15 of 17 tasks
skyzh opened this issue Jul 28, 2022 · 6 comments
Closed
15 of 17 tasks

RFC: parser, binder, planner and interactive shell for bustub #257

skyzh opened this issue Jul 28, 2022 · 6 comments

Comments

@skyzh
Copy link
Member

skyzh commented Jul 28, 2022

I just realized that bustub doesn't have an interactive shell (or accept SQL statements) like other database systems do. From my perspective, this might make it a little bit hard for students to have a complete overview of how a relational database system looks like. Students cannot use SQL queries to test out their own changes. They have to manually construct query plans (which is prune to error!) to test the implementation of executors. Examples:

// SELECT colA, colB FROM test_3 LIMIT 10
TEST_F(ExecutorTest, DISABLED_SimpleLimitTest) {
auto *table_info = GetExecutorContext()->GetCatalog()->GetTable("test_3");
auto &schema = table_info->schema_;
auto *col_a = MakeColumnValueExpression(schema, 0, "colA");
auto *col_b = MakeColumnValueExpression(schema, 0, "colB");
auto *out_schema = MakeOutputSchema({{"colA", col_a}, {"colB", col_b}});
// Construct sequential scan
auto seq_scan_plan = std::make_unique<SeqScanPlanNode>(out_schema, nullptr, table_info->oid_);
// Construct the limit plan
auto limit_plan = std::make_unique<LimitPlanNode>(out_schema, seq_scan_plan.get(), 10);
// Execute sequential scan with limit
std::vector<Tuple> result_set{};
GetExecutionEngine()->Execute(limit_plan.get(), &result_set, GetTxn(), GetExecutorContext());
// Verify results
ASSERT_EQ(result_set.size(), 10);
for (auto i = 0UL; i < result_set.size(); ++i) {
auto &tuple = result_set[i];
ASSERT_EQ(tuple.GetValue(out_schema, out_schema->GetColIdx("colA")).GetAs<int32_t>(), static_cast<int32_t>(i));
ASSERT_EQ(tuple.GetValue(out_schema, out_schema->GetColIdx("colB")).GetAs<int32_t>(), static_cast<int32_t>(i));
}
}

I'm thinking of introducing a very simple parser, binder and planner framework to the bustub project, and upon that, having an interactive shell. The parser, binder and planner may have very limited functionalities (e.g. don't support subqueries), but will support at least the following queries (as in the executor test):

-- select star
SELECT * FROM empty_table2;

-- simple filter
SELECT col_a, col_b FROM test_1 WHERE col_a < 500

-- select with limit
SELECT colA, colB FROM test_3 LIMIT 10

-- insert with values
INSERT INTO empty_table2 VALUES (100, 10), (101, 11), (102, 12)

-- insert with select from
INSERT INTO empty_table2 SELECT col_a, col_b FROM test_1 WHERE col_a < 500

-- update
UPDATE test_3 SET colB = colB + 1;

-- delete
DELETE FROM test_1 WHERE col_a == 50;

-- joins with explicit condition
SELECT test_1.col_a, test_1.col_b, test_2.col1, test_2.col3 FROM test_1 JOIN test_2 ON test_1.col_a = test_2.col1;

-- simple aggregation
SELECT COUNT(col_a), SUM(col_a), min(col_a), max(col_a) from test_1;

-- group aggregation
SELECT count(col_a), col_b, sum(col_c) FROM test_1 Group By col_b HAVING count(col_a) > 100

-- select distinct (w/o support for distinct aggregation like `count(distinct colC)`)
SELECT DISTINCT colC FROM test_7

-- create table
CREATE TABLE table (v1 int);

-- drop table
DROP TABLE table;

Currently I have no idea how much code work will it take to have such binder / parser / planner / interactive shell inside bustub. I also don't expect to finish everything before the fall semester. From the discussion inside the Slack channel, cc @garrisonhess may already have some ideas on how to implement a SQLite wrapper (instead of writing our own version of every component) around bustub (But I don't know the progress. Would you please share about what you've done if you have time? Thanks!). Anyway, it would be good if we can use bustub like other database systems (e.g. SQLite) to accept SQL as input.

I'd like to bring this topic to discussion. Comments are welcomed!

@skyzh
Copy link
Member Author

skyzh commented Jul 28, 2022

Missed one important functionality:

--- explain
EXPLAIN SELECT DISTINCT colC FROM test_7

@apavlo
Copy link
Member

apavlo commented Jul 28, 2022

@garrisonhess Can you point him at your BusTub branch with the DuckDB client?

@lmwnshn
Copy link
Collaborator

lmwnshn commented Jul 28, 2022

As a nice-to-have (doesn't have to come with this change or be written by you, other TAs might be able to pick it up), being able to visualize the plan tree would be helpful for students too. e.g., a former TA Ricky wrote the code to visualize B-Trees via graphviz and I think that was a major improvement in student quality of life. Something that might be nice to consider upfront as you bring in SQL.

@GavinRay97
Copy link

GavinRay97 commented Aug 15, 2022

I can volunteer to implement the expression AST + query plan visualizer.

Also, it's certainly not as ergonomic as SQL, but for fast feedback during development and prototyping, may be useful to mention cling to students (C++ repl using clang, formed the basis for new llvm/clang-repl project):

CLion recently gained support for it in-editor which makes it kind of nice:

Can give a comfy feedback loop compared to edit -> compile -> run, more like a scripting language 👍

@skyzh
Copy link
Member Author

skyzh commented Aug 30, 2022

Task list updated.

@skyzh
Copy link
Member Author

skyzh commented Sep 26, 2022

All of the tasks are already done in this roadmap (there're still pending PRs). BusTub now has decent Postgres-dialect SQL support. Close this for now.

@skyzh skyzh closed this as completed Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants