|
| 1 | +# Velox Coding Style and Best Practices |
| 2 | + |
| 3 | +The goal of this document is to make Velox developers more productive by |
| 4 | +promoting consistency within the codebase, and encouraging common practices |
| 5 | +which will make the codebase easier to read, edit, maintain and debug in the |
| 6 | +future. |
| 7 | + |
| 8 | +## Cpp Style |
| 9 | + |
| 10 | +Many aspects of cpp style will be covered by clang-format, such as spacing, |
| 11 | +line width, indentation and ordering (for includes, using directives and etc). |
| 12 | + |
| 13 | +* Always ensure your code is clang-format compatible. |
| 14 | +* If you’re working on a legacy file which is not clang-format compliant yet, |
| 15 | + refrain from formatting the entire file in the same PR as it pollutes your |
| 16 | + original PR and makes it harder to review. |
| 17 | + * Submit a separate diff/PR with the format-only changes. |
| 18 | + |
| 19 | +## Naming Conventions |
| 20 | + |
| 21 | +* Use **PascalCase** for types (classes, structs, enums, type aliases, type |
| 22 | + template parameters) and file names. |
| 23 | +* Use **camelCase** for functions, member and local variables, and non-type |
| 24 | + template parameters. |
| 25 | +* **camelCase_** for private and protected members variables. |
| 26 | +* Use **snake_case** for namespace names and build targets |
| 27 | +* Use **UPPER_SNAKE_CASE** for macros. |
| 28 | +* Use **kPascalCase** for static constants. |
| 29 | + |
| 30 | + |
| 31 | +## Comments |
| 32 | + |
| 33 | +Some good practices about code comments: |
| 34 | + |
| 35 | +* Optimize code for the reader, not the writer. |
| 36 | + * Velox is growing and starting to be used by multiple compute engines and |
| 37 | + teams; as a result, more time will be spent reading code than writing it. |
| 38 | +* Overall goal: make the code more obvious and remove obscurity. |
| 39 | +* Comments should capture information that was on the writer’s mind, but |
| 40 | + **couldn’t be represented as code.** |
| 41 | + * As such, refrain from adding obvious comments, e.g: simple getter/setter |
| 42 | + methods. |
| 43 | + * However, “obviousness” is in the reader’s mind - if a reviewer says |
| 44 | + something is not obvious, then it is not obvious. |
| 45 | + * Consider that the audience is an experienced Software Engineer with a |
| 46 | + moderate knowledge of the codebase. |
| 47 | +* What should be commented: |
| 48 | + * Every File |
| 49 | + * Every Class |
| 50 | + * Every method that's not an obvious getter/setter |
| 51 | + * Every member variable |
| 52 | + * Do not simply restate the variable name. Either add a comment explaining |
| 53 | + the semantic meaning of that variable or do not add a comment at all. |
| 54 | + This is an anti-pattern: ``` // A simple counter. size_t count_{0}; ``` |
| 55 | + * For functions with large bodies, a good practice is to group blocks of |
| 56 | + related code, and precede them with a blank line and a high-level comment |
| 57 | + on what the block does. |
| 58 | + |
| 59 | +About comment style: |
| 60 | + |
| 61 | +* Header comment for source files: |
| 62 | + * All source files (.h and .cpp) should have a standard header in a large |
| 63 | + comment block at the top; this should include the standard copyright |
| 64 | + notice, the license header, and a brief file description. This comment |
| 65 | + block should use `/* */` |
| 66 | +* For single line comments, always use `//` (with a space separator before the |
| 67 | + comment). |
| 68 | +* Comments should be full english sentences, starting with a capital letter and |
| 69 | + ending with a period (.). |
| 70 | + * Use: ``` // True if this node only sorts a portion of the final result. |
| 71 | + ``` |
| 72 | + * Instead of: ``` // true if this node only sorts a portion of the final |
| 73 | + result ``` |
| 74 | +* For multi-line comments: |
| 75 | + * Velox will follow the doxygen comment style. |
| 76 | + * For multi-line comments within actual code blocks (the ones which are not |
| 77 | + to be exposed as documentation, use `//` (double slashes). |
| 78 | + * For comments on headers for classes, functions, methods and member |
| 79 | + variables (the ones to be exposed as documentation), use `///` (three |
| 80 | + slashes) at the front of each line. |
| 81 | + * Don't use old-style `/* */` comments inside code or on top-level header |
| 82 | + comments. It adds two additional lines and makes headers more verbose than |
| 83 | + they need to be. |
| 84 | +* Special comments: |
| 85 | + * Use this format when something needs to be fixed in the future: ``` // |
| 86 | + TODO: Description of what needs to be fixed. ``` |
| 87 | + * Include enough context in the comment itself to make clear what will be |
| 88 | + done, without requiring any references from outside the code. |
| 89 | + * Do not include the author’s username. If required, this can always be |
| 90 | + retrieved from git blame. |
| 91 | +
|
| 92 | +## Asserts and CHECKs |
| 93 | +
|
| 94 | +* For assertions and other types of validation, use `VELOX_CHECK_*` macros |
| 95 | + * `VELOX_CHECK_*` will categorize the error to be an internal runtime error. |
| 96 | + * `VELOX_USER_CHECK_*` will categorize the error to be a user error. |
| 97 | +* When comparing two values/expressions, prefer to use: |
| 98 | + * `VELOX_CHECK_LT(idx, children_.size());` |
| 99 | +* Rather than: |
| 100 | + * `VELOX_CHECK(idx < children_.size());` |
| 101 | +* The former will evaluate and include both expressions values in the |
| 102 | + exception. |
| 103 | +* All macro also provide the following optional features: |
| 104 | + * Specifying error code (error code listed in |
| 105 | + `velox/common/base/VeloxException.h`): |
| 106 | + * `VELOX_CHECK_EQ(v1[, v2[, error_code]]);` |
| 107 | + * `VELOX_USER_CHECK_EQ(v1[, v2[, error_code]])` |
| 108 | + * Appending error message: |
| 109 | + * `VELOX_CHECK_EQ(v1, v2, “Some error message”)` |
| 110 | + * `VELOX_USER_CHECK_EQ(v1, v2, “Some error message”)` |
| 111 | + * Error message formatting via fmt: |
| 112 | + * `VELOX_USER_CHECK_EQ( v1, v2, “My complex error message {} and {}”, str, |
| 113 | + i)` |
| 114 | + * Note that the values of v1 and v2 are already included in the exception |
| 115 | + message by default. |
| 116 | +
|
| 117 | +## Variables |
| 118 | +
|
| 119 | +* Do not declare more than one variable on a line (this helps accomplish the |
| 120 | + other guidelines, promotes commenting, and is good for tagging). |
| 121 | + * Obvious exception: for-loop initialization |
| 122 | +* Initialize most variables at the time of declaration (unless they are class |
| 123 | + objects without default constructors). |
| 124 | + * Prefer using uniform-initialization to make initialization more consistent |
| 125 | + across types, e.g., prefer `size_t size{0};` over `size_t size = 0;` |
| 126 | +* Declare your variables in the smallest scope possible. |
| 127 | +* Declare your variables as close to the usage point as possible within the |
| 128 | + given scope. |
| 129 | +* Don't group all your variables at the top of the scope -- this makes the code |
| 130 | + much harder to follow. |
| 131 | +* If the variable or function parameter is a pointer or reference type, group |
| 132 | + the `*` or &` with the type -- pointer-ness or reference-ness is an attribute |
| 133 | + of the type, not the name. |
| 134 | + * `int* foo;` `const Bar& bar;` NOT `int *foo; `const Bar &bar`; |
| 135 | + * Beware that `int* foo, bar;` will be parsed as declaring `foo` as an `int*` |
| 136 | + and `bar` as an `int`. Note that multiple declaration is discouraged. |
| 137 | +* For member variables: |
| 138 | + * Group member variable and methods based on their visibility (public, |
| 139 | + protected and private) |
| 140 | + * It’s ok to have multiple blocks for a given level. |
| 141 | + * Most member variables should come with a short comment description and an |
| 142 | + empty line above that. |
| 143 | + * Refrain from using public member variables whenever possible, in order to |
| 144 | + promote encapsulation. |
| 145 | + * Leverage getter/setter methods when appropriate. |
| 146 | +* Prefer to use value-types, `std::optional`, and `std::unique_ptr` in that |
| 147 | + order. |
| 148 | + * Value-types are conceptually the simplest and cheapest. |
| 149 | + * std::optional allows you to express “may be null” without the additional |
| 150 | + complexity of manual storage duration. |
| 151 | + * `std::unique_ptr<>` should be used for types that are not cheaply movable |
| 152 | + but need to transfer ownership, or which are too large to store on the |
| 153 | + stack. Note that most types that perform large allocations already store |
| 154 | + their bulk memory on-heap. |
| 155 | +
|
| 156 | +## Constants |
| 157 | +
|
| 158 | +* Always use `nullptr` if you need a constant that represents a null pointer |
| 159 | + (`T*` for some `T`); use `0` otherwise for a zero value. |
| 160 | +* For large literal numbers, use ‘ to make it more readable, e.g: `1’000’000` |
| 161 | + instead of `1000000`. |
| 162 | +* For floating point literals, never omit the initial 0 before the decimal |
| 163 | + point (always `0.5`, not `.5`). |
| 164 | +* File level variables and constants should be defined in an anonymous |
| 165 | + namespace. |
| 166 | +* Always prefer const variables and enum to using preprocessor (#define) to |
| 167 | + define constant values. |
| 168 | +* Prefer `enum class` over `enum` for better type safety. |
| 169 | +* As a general rule, do not use string literals without declaring a named |
| 170 | + constant for them. |
| 171 | + * The best way to make a constant string literal is to use constexpr |
| 172 | + `std::string_view`/folly::StringPiece` |
| 173 | + * **NEVER** use `std::string` - this makes your code more prone to SIOF bugs. |
| 174 | + * Avoid `const char* const` and `const char*` - these are less efficient to |
| 175 | + convert to `std::string` later on in your program if you ever need to |
| 176 | + because `std::string_view`/`folly::StringPiece` knows its size and can use |
| 177 | + a more efficient constructor. `std::string_view`/`folly::StringPiece` also |
| 178 | + has richer interfaces and often works as a drop-in replacement to |
| 179 | + `std::string`. |
| 180 | + * Need compile-time string concatenation? You can use `folly::FixedString` |
| 181 | + for that. |
| 182 | +
|
| 183 | +## Macros |
| 184 | +
|
| 185 | +Do not use them unless absolutely necessary. Whenever possible, use normal |
| 186 | +inline functions or templates instead. If you absolutely need, remember that |
| 187 | +macro names are always upper-snake-case. Also: |
| 188 | +
|
| 189 | +* Use parentheses to sanitize complex inputs. |
| 190 | + * For example, `#define MUL(x, y) x * y` is incorrect for input like `MUL(2, |
| 191 | + 1 + 1)`. The author probably meant `#define MUL(x, y) ((x) * (y))` instead. |
| 192 | +* When making macros that have multiple statements, use this idiom: |
| 193 | + * `do { stmt1; stmt2; } while (0)` |
| 194 | + * This allows this block to act the most like a single statement, usable in |
| 195 | + if/for/while even if they don't use braces and forces use of a trailing |
| 196 | + semicolon. |
| 197 | +
|
| 198 | +## Headers and Includes |
| 199 | +
|
| 200 | +* All header files must have a single-inclusion guard using `#pragma once` |
| 201 | +* Included files and using declarations should all be at the top of the file, |
| 202 | + and ordered properly to make it easier to see what is included. |
| 203 | + * Use clang-format to order your include and using directives. |
| 204 | +* Includes should always use the full path (relative to github’s root dir). |
| 205 | +* Whenever possible, try to forward-declare as much as possible in the .h and |
| 206 | + only `#includ`e things you need the full implementation for. |
| 207 | + * For instance, if you just use a `Class*` or `Class&` in a header file, |
| 208 | + forward-declare `Class` instead of including the full header to minimize |
| 209 | + header dependencies. |
| 210 | +* Put small private classes that will only get used in one place into a .cpp, |
| 211 | + inside an anonymous namespace. A common example is a custom comparator class. |
| 212 | +* Be aware of what goes into your .h files. If possible, try to separate them |
| 213 | + into "Public API" .h files that may be included by external modules. These |
| 214 | + public .h files should contain only those functions or classes that external |
| 215 | + users need to access. |
| 216 | +* No large blocks of code should be in .h files that are widely included if |
| 217 | + these blocks aren't vital to all users' understanding of the interface the .h |
| 218 | + represents. Split these large implementation blocks into a separate `-inl.h` |
| 219 | + file. |
| 220 | + * `-inl.h` files should exist only to aid readability with the expectation |
| 221 | + that a user of your library should only need to read the .h to understand |
| 222 | + the interface. Only if someone needs to understand your implementation |
| 223 | + should opening the `-inl.h` be necessary (or the .cpp file, for that |
| 224 | + matter). |
| 225 | +
|
| 226 | +## Function Arguments |
| 227 | +
|
| 228 | +* Const |
| 229 | + * All objects, whenever possible, should be passed to functions as const-refs |
| 230 | + (`const T&`). This makes APIs much clearer as to whether it’s an input or |
| 231 | + output arg, reduces the need for NULL-checks (and reduces crashing bugs), |
| 232 | + and helps enforce good encapsulation. |
| 233 | + * An obvious exception is objects that are trivially copy- or |
| 234 | + move-constructible, like primitive types or `std::unique_ptr`. |
| 235 | + * Don't pass by const reference when passing by value is both correct and |
| 236 | + cheaper. |
| 237 | + * All non-static member functions should be marked as const if calling them |
| 238 | + doesn't change the behavior/response of any member. |
| 239 | +* References as function arguments. |
| 240 | + * For input or read-only parameters, pass by `const&` |
| 241 | + * For nullable or optional parameters, use higher order primitives like |
| 242 | + `std::optional`; raw pointers are also appropriate. |
| 243 | + * For output, mutable, or in/out parameters: |
| 244 | + * Non-const ref is the recommendation if the argument is not nullable. |
| 245 | + * If it’s nullable, use a raw pointer. |
| 246 | + * Always use `std::optional` instead of `folly::Optional` and |
| 247 | + `boost::optional`. |
| 248 | +* Prefer `std::string_view` to `const std::string&` |
| 249 | + * This avoids unnecessary construction of a std::string when passing other |
| 250 | + types. This often happens when string literals are passed into functions. |
| 251 | + * If you need a `std::string` inside the function, however, take a |
| 252 | + `std::string` to move the copy to the API boundary. This allows the caller |
| 253 | + to avoid the copy with a move. |
| 254 | +* Almost never take an r-value reference as a function argument; take the |
| 255 | + argument by value instead. This usually as efficient as taking an r-value |
| 256 | + reference, but simpler and more flexible at the call site. For example: ``` |
| 257 | + InPredicate::InPredicate(folly::F14HashSet<T>&& rhs) : rhs_(std::move(rhs)) |
| 258 | + {} ``` |
| 259 | + * is more complex than the version that takes expr by value: ``` |
| 260 | + InPredicate::InPredicate(folly::F14HashSet<T> rhs) : rhs_(std::move(rhs)) |
| 261 | + {} ``` |
| 262 | + * However, the first requires either a std::move() or an explicit copy at the |
| 263 | + call point; the second doesn’t. The performance of the two should be almost |
| 264 | + identical. |
| 265 | +* Add comments when it’s not clear which argument is intended in a function |
| 266 | + call. This is particularly a problem for longer signatures with repeated |
| 267 | + types or constant arguments. For example, `phrobinicate(/*elements=*/{1, 2}, |
| 268 | + /*startOffset=*/0, /*length=*/2)` |
| 269 | +* Use the /*argName=*/value format (note the lack of spaces). Clang-tidy |
| 270 | + supports checking the given argument name against the declaration when this |
| 271 | + format is used. |
| 272 | +
|
| 273 | +## Namespaces |
| 274 | +
|
| 275 | +* All Velox code should be placed inside the facebook::velox namespace |
| 276 | +* Always use nested namespace definition: |
| 277 | + * `namespace facebook::velox::core {` Rather than |
| 278 | + * `namespace facebook { namespace velox { namespace core {` |
| 279 | +* Use sub namespaces (e.g: facebook::velox::core) to logically group large |
| 280 | + chunks of related code and prevent identifier clashes. |
| 281 | + * Namespaces should make it easier to code, not harder. Refrain from creating |
| 282 | + hierarchies that are way too long. |
| 283 | + * Namespace don’t necessarily need to reflect the on-disk layout. |
| 284 | + * This isn’t java. |
| 285 | +* Guidelines for importing namespaces: |
| 286 | + * Don't EVER put `using namespace anything;` OR `using anything::anything;` |
| 287 | + in a header file. This pollutes the global namespace and makes it extremely |
| 288 | + difficult to refactor code. |
| 289 | + * It’s very useful to not have to fully-qualify types as it can make code |
| 290 | + more readable. In header files, it's unavoidable. ALWAYS fully qualify |
| 291 | + names in a header (e.g. it's `std::string`, not just `string`). |
| 292 | + * In cpp files, best practice is to add a using declaration after your list |
| 293 | + of `#includes` (e.g. `using tests::VectorMaker;`) and to then just write |
| 294 | + `VectorMaker`. |
| 295 | + * `using namespace std;` can lead to surprising behavior, especially when |
| 296 | + other modules don't follow best practices with their using statements and |
| 297 | + what they dump into global scope. The best defensive programming practice |
| 298 | + for std it to fully qualify symbols, e.g: `std::string`, std::vector |
| 299 | + * Do not use `using namespace std;` |
| 300 | + * Use `using namespace anything_else;` somewhat sparingly – the whole point |
| 301 | + of namespaces is obliterated when the code starts doing this prolifically. |
| 302 | + Frequently, `using foo::bar::BazClass;` is better. |
0 commit comments