Skip to content

Commit d188613

Browse files
bikramSingh91facebook-github-bot
authored andcommitted
Add input_dir and lazy_column_list_path option to expression runner (facebookincubator#3231)
Summary: This patch adds two startup flags to ExpressionRunner: input_dir: allows it to find required input files in a specified directory. It searches for files generated by ExpressionVerfier like input_vector, result_vector, sql, indices_of_lazy_columns and complex_constants. lazy_column_list_path: allows it to consume this list of column indices that represents the columns in the input vector to be wrapped in lazy Pull Request resolved: facebookincubator#3231 Test Plan: - Added a unit test for a utility function - Manually tested Reviewed By: mbasmanova Differential Revision: D41293907 Pulled By: bikramSingh91 fbshipit-source-id: 68c287c8d1e261296e3d6169992b1cb8c963542a
1 parent 9ae4bf9 commit d188613

13 files changed

+216
-83
lines changed

velox/docs/develop/testing/fuzzer.rst

+17-7
Original file line numberDiff line numberDiff line change
@@ -164,21 +164,27 @@ input vector and expression to files and replay these later.
164164

165165
2. Run Expression Runner using generated files.
166166

167-
``--repro_persist_path <path/to/directory>`` flag tells the Fuzzer to save
168-
input vector and expression SQL to files in the specified directory and print
169-
out the exact paths. Fuzzer uses :doc:`VectorSaver <../debugging/vector-saver>` for storing vectors on disk
170-
while preserving encodings.
171-
172-
ExpressionRunner takes a path to input vector, path to expression SQL and
173-
"mode" and evaluates the specified expression on the specified data.
167+
``--repro_persist_path <path/to/directory>`` flag tells the Fuzzer to save the
168+
input vector, initial result vector, expression SQL, and other relevant data to files in a new directory saved within
169+
the specified directory. It also prints out the exact paths for these. Fuzzer uses :doc:`VectorSaver <../debugging/vector-saver>`
170+
for storing vectors on disk while preserving encodings.
171+
172+
ExpressionRunner needs at the very least a path to input vector and path to expression SQL to run.
173+
However, you might need more files to reproduce the issue. All of which will be present in the directory
174+
that the fuzzer test generated. You can directly point the ExpressionRunner to that directory using --fuzzer_repro_path
175+
where it will pick up all the files automatically or you can specify each explicitly using other startup flags.
174176
ExpressionRunner supports the following flags:
175177

178+
* ``--fuzzer_repro_path`` directory path where all input files (required to reproduce a failure) that are generated by the Fuzzer are expected to reside. ExpressionRunner will automatically pick up all the files from this folder unless they are explicitly specified via their respective startup flag.
179+
176180
* ``--input_path`` path to input vector that was created by the Fuzzer
177181

178182
* ``--sql_path`` path to expression SQL that was created by the Fuzzer
179183

180184
* ``--complex_constant_path`` optional path to complex constants that aren't accurately expressable in SQL (Array, Map, Structs, ...). This is used with SQL file to reproduce the exact expression, not needed when the expression doesn't contain complex constants.
181185

186+
* ``--lazy_column_list_path`` optional path for the file stored on-disk which contains a vector of column indices that specify which columns of the input row vector should be wrapped in lazy. This is used when the failing test included input columns that were lazy vector.
187+
182188
* ``--result_path`` optional path to result vector that was created by the Fuzzer. Result vector is used to reproduce cases where Fuzzer passes dirty vectors to expression evaluation as a result buffer. This ensures that functions are implemented correctly, taking into consideration dirty result buffer.
183189

184190
* ``--mode`` run mode. One of "verify", "common" (default), "simplified".
@@ -189,8 +195,12 @@ ExpressionRunner supports the following flags:
189195

190196
- ``simplified`` evaluates the expression using simplified path and prints the results to stdout.
191197

198+
- ``query`` evaluate SQL query specified in --sql or --sql_path and print out results. If --input_path is specified, the query may reference it as table 't'.
199+
192200
* ``--num_rows`` optional number of rows to process in common and simplified modes. Default: 10. 0 means all rows. This flag is ignored in 'verify' mode.
193201

202+
* ``--store_result_path`` optional directory path for storing the results of evaluating SQL expression or query in 'common', 'simplified' or 'query' modes.
203+
194204
Example command:
195205

196206
::

velox/expression/tests/ExpressionFuzzer.cpp

+3-3
Original file line numberDiff line numberDiff line change
@@ -256,8 +256,8 @@ std::vector<column_index_t> generateLazyColumnIds(
256256
std::vector<column_index_t> columnsToWrapInLazy;
257257
if (FLAGS_lazy_vector_generation_ratio > 0) {
258258
for (column_index_t idx = 0; idx < rowVector->childrenSize(); idx++) {
259-
if (rowVector->childAt(idx) &&
260-
vectorFuzzer.coinToss(FLAGS_lazy_vector_generation_ratio)) {
259+
VELOX_CHECK_NOT_NULL(rowVector->childAt(idx));
260+
if (vectorFuzzer.coinToss(FLAGS_lazy_vector_generation_ratio)) {
261261
columnsToWrapInLazy.push_back(idx);
262262
}
263263
}
@@ -413,7 +413,7 @@ void ExpressionFuzzer::reSeed() {
413413
}
414414

415415
RowVectorPtr ExpressionFuzzer::generateRowVector() {
416-
return vectorFuzzer_.fuzzRow(
416+
return vectorFuzzer_.fuzzInputRow(
417417
ROW(std::move(inputRowNames_), std::move(inputRowTypes_)));
418418
}
419419

velox/expression/tests/ExpressionRunner.cpp

+19-3
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
#include "velox/common/memory/Memory.h"
2121
#include "velox/core/QueryCtx.h"
2222
#include "velox/exec/tests/utils/AssertQueryBuilder.h"
23-
#include "velox/exec/tests/utils/QueryAssertions.h"
2423
#include "velox/expression/Expr.h"
2524
#include "velox/expression/tests/ExpressionRunner.h"
2625
#include "velox/expression/tests/ExpressionVerifier.h"
@@ -29,6 +28,7 @@
2928
#include "velox/parse/QueryPlanner.h"
3029
#include "velox/parse/TypeResolver.h"
3130
#include "velox/vector/VectorSaver.h"
31+
#include "velox/vector/fuzzer/VectorFuzzer.h"
3232

3333
namespace facebook::velox::test {
3434

@@ -113,7 +113,8 @@ void ExpressionRunner::run(
113113
const std::string& resultPath,
114114
const std::string& mode,
115115
vector_size_t numRows,
116-
const std::string& storeResultPath) {
116+
const std::string& storeResultPath,
117+
const std::string& lazyColumnListPath) {
117118
VELOX_CHECK(!sql.empty());
118119

119120
std::shared_ptr<core::QueryCtx> queryCtx{std::make_shared<core::QueryCtx>()};
@@ -134,6 +135,12 @@ void ExpressionRunner::run(
134135
VELOX_CHECK_GT(inputVector->size(), 0, "Input vector must not be empty.");
135136
}
136137

138+
std::vector<column_index_t> columnsToWrapInLazy;
139+
if (!lazyColumnListPath.empty()) {
140+
columnsToWrapInLazy =
141+
restoreStdVectorFromFile<column_index_t>(lazyColumnListPath.c_str());
142+
}
143+
137144
parse::registerTypeResolver();
138145

139146
if (mode == "query") {
@@ -179,8 +186,17 @@ void ExpressionRunner::run(
179186
VELOX_CHECK_EQ(
180187
1, typedExprs.size(), "'verify' mode supports only one SQL expression");
181188
test::ExpressionVerifier(&execCtx, {false, ""})
182-
.verify(typedExprs[0], inputVector, std::move(resultVector), true);
189+
.verify(
190+
typedExprs[0],
191+
inputVector,
192+
std::move(resultVector),
193+
true,
194+
columnsToWrapInLazy);
183195
} else if (mode == "common") {
196+
if (!columnsToWrapInLazy.empty()) {
197+
inputVector =
198+
VectorFuzzer::fuzzRowChildrenToLazy(inputVector, columnsToWrapInLazy);
199+
}
184200
exec::ExprSet exprSet(typedExprs, &execCtx);
185201
auto results = evaluateAndPrintResults(exprSet, inputVector, rows, execCtx);
186202
if (!storeResultPath.empty()) {

velox/expression/tests/ExpressionRunner.h

+8-1
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,12 @@ class ExpressionRunner {
4343
/// "simplified"]
4444
/// @param numRows Maximum number of rows to process. 0 means 'all' rows.
4545
/// Applies to "common" and "simplified" modes only.
46+
/// @param storeResultPath The path to a directory on disk where the results
47+
/// of expression or query evaluation will be stored. If empty, the results
48+
/// will not be stored.
49+
/// @param lazyColumnListPath The path to on-disk vector of column indices
50+
/// that specify which columns of the input row vector should be wrapped in
51+
/// lazy.
4652
///
4753
/// User can refer to 'VectorSaver' class to see how to serialize/preserve
4854
/// vectors to disk.
@@ -53,7 +59,8 @@ class ExpressionRunner {
5359
const std::string& resultPath,
5460
const std::string& mode,
5561
vector_size_t numRows,
56-
const std::string& storeResultPath);
62+
const std::string& storeResultPath,
63+
const std::string& lazyColumnListPath);
5764

5865
/// Parse comma-separated SQL expressions. This should be treated as private
5966
/// except for tests.

velox/expression/tests/ExpressionRunnerTest.cpp

+71-6
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,14 @@
1818
#include <folly/init/Init.h>
1919
#include <gflags/gflags.h>
2020
#include <gtest/gtest.h>
21+
#include "velox/common/base/Fs.h"
22+
#include "velox/expression/tests/ExpressionVerifier.h"
2123
#include "velox/functions/prestosql/aggregates/RegisterAggregateFunctions.h"
2224
#include "velox/functions/prestosql/registration/RegistrationFunctions.h"
2325
#include "velox/vector/VectorSaver.h"
2426

27+
using namespace facebook::velox;
28+
2529
DEFINE_string(
2630
input_path,
2731
"",
@@ -39,7 +43,7 @@ DEFINE_string(
3943
DEFINE_string(
4044
complex_constant_path,
4145
"",
42-
"Path for complex constants that aren't expressable in SQL.");
46+
"Path for complex constants that aren't expressible in SQL.");
4347

4448
DEFINE_string(
4549
sql,
@@ -68,6 +72,13 @@ DEFINE_string(
6872
"results. If --input_path is specified, the query may reference it as "
6973
"table 't'.");
7074

75+
DEFINE_string(
76+
lazy_column_list_path,
77+
"",
78+
"Path for the file stored on-disk which contains a vector of column "
79+
"indices that specify which columns of the input row vector should "
80+
"be wrapped in lazy.");
81+
7182
static bool validateMode(const char* flagName, const std::string& value) {
7283
static const std::unordered_set<std::string> kModes = {
7384
"common", "simplified", "verify", "query"};
@@ -95,32 +106,86 @@ DEFINE_string(
95106
"Directory path for storing the results of evaluating SQL expression or "
96107
"query in common, simplified or query modes.");
97108

109+
DEFINE_string(
110+
fuzzer_repro_path,
111+
"",
112+
"Directory path where all input files generated by ExpressionVerifier are "
113+
"expected to reside. For more details on which files and their names are "
114+
"expected, please checkout the ExpressionVerifier class. Any file paths "
115+
"already specified via a startup flag will take precedence.");
116+
117+
static std::string checkAndReturnFilePath(
118+
const std::string_view& fileName,
119+
const std::string& flagName) {
120+
auto path = fmt::format("{}/{}", FLAGS_fuzzer_repro_path, fileName);
121+
if (fs::exists(path)) {
122+
LOG(INFO) << "Using " << flagName << " = " << path;
123+
return path;
124+
} else {
125+
LOG(INFO) << "File for " << flagName << " not found.";
126+
}
127+
return "";
128+
}
129+
130+
static void checkDirForExpectedFiles() {
131+
LOG(INFO) << "Searching input directory for expected files at "
132+
<< FLAGS_fuzzer_repro_path;
133+
134+
FLAGS_input_path = FLAGS_input_path.empty()
135+
? checkAndReturnFilePath(
136+
test::ExpressionVerifier::kInputVectorFileName, "input_path")
137+
: FLAGS_input_path;
138+
FLAGS_result_path = FLAGS_result_path.empty()
139+
? checkAndReturnFilePath(
140+
test::ExpressionVerifier::kResultVectorFileName, "result_path")
141+
: FLAGS_result_path;
142+
FLAGS_sql_path = FLAGS_sql_path.empty()
143+
? checkAndReturnFilePath(
144+
test::ExpressionVerifier::kExpressionSqlFileName, "sql_path")
145+
: FLAGS_sql_path;
146+
FLAGS_lazy_column_list_path = FLAGS_lazy_column_list_path.empty()
147+
? checkAndReturnFilePath(
148+
test::ExpressionVerifier::kIndicesOfLazyColumnsFileName,
149+
"lazy_column_list_path")
150+
: FLAGS_lazy_column_list_path;
151+
FLAGS_complex_constant_path = FLAGS_complex_constant_path.empty()
152+
? checkAndReturnFilePath(
153+
test::ExpressionVerifier::kComplexConstantsFileName,
154+
"complex_constant_path")
155+
: FLAGS_complex_constant_path;
156+
}
157+
98158
int main(int argc, char** argv) {
99159
::testing::InitGoogleTest(&argc, argv);
100160
// Calls common init functions in the necessary order, initializing
101161
// singletons, installing proper signal handlers for better debugging
102162
// experience, and initialize glog and gflags.
103163
folly::init(&argc, &argv);
104164

165+
if (!FLAGS_fuzzer_repro_path.empty()) {
166+
checkDirForExpectedFiles();
167+
}
168+
105169
if (FLAGS_sql.empty() && FLAGS_sql_path.empty()) {
106170
std::cout << "One of --sql or --sql_path flags must be set." << std::endl;
107171
exit(1);
108172
}
109173

110174
auto sql = FLAGS_sql;
111175
if (sql.empty()) {
112-
sql = facebook::velox::restoreStringFromFile(FLAGS_sql_path.c_str());
176+
sql = restoreStringFromFile(FLAGS_sql_path.c_str());
113177
VELOX_CHECK(!sql.empty());
114178
}
115179

116-
facebook::velox::functions::prestosql::registerAllScalarFunctions();
117-
facebook::velox::aggregate::prestosql::registerAllAggregateFunctions();
118-
facebook::velox::test::ExpressionRunner::run(
180+
functions::prestosql::registerAllScalarFunctions();
181+
aggregate::prestosql::registerAllAggregateFunctions();
182+
test::ExpressionRunner::run(
119183
FLAGS_input_path,
120184
sql,
121185
FLAGS_complex_constant_path,
122186
FLAGS_result_path,
123187
FLAGS_mode,
124188
FLAGS_num_rows,
125-
FLAGS_store_result_path);
189+
FLAGS_store_result_path,
190+
FLAGS_lazy_column_list_path);
126191
}

velox/expression/tests/ExpressionRunnerUnitTest.cpp

+1-2
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ TEST_F(ExpressionRunnerUnitTest, run) {
4545
auto sqlFile = exec::test::TempFilePath::create();
4646
auto resultFile = exec::test::TempFilePath::create();
4747
const char* inputPath = inputFile->path.data();
48-
const char* sqlPath = sqlFile->path.data();
4948
const char* resultPath = resultFile->path.data();
5049
const int vectorSize = 100;
5150

@@ -60,7 +59,7 @@ TEST_F(ExpressionRunnerUnitTest, run) {
6059
saveVectorToFile(resultVector.get(), resultPath);
6160

6261
EXPECT_NO_THROW(ExpressionRunner::run(
63-
inputPath, "length(c0)", "", resultPath, "verify", 0, ""));
62+
inputPath, "length(c0)", "", resultPath, "verify", 0, "", ""));
6463
}
6564

6665
TEST_F(ExpressionRunnerUnitTest, persistAndReproComplexSql) {

0 commit comments

Comments
 (0)