-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
now using name decenttree.cpp #9
Conversation
optimisation (~2x) & tree search (~2x) by "serialising" their memory access patterns to improve spatial and temporal locality of reference. 1. PhyloTree alignment summaries may now be "borrowed" from another tree that has the same alignment (the relevant member is isSummaryBorrowed; if it is true, this instance doesn't own the summary, it is only "Borrowing" a reference to a summary owned by another tree). 2. PhyloTree member functions copyPhyloTree and copyPhyloTreeMixlen take an extra parameter indicating whether the copy is to "borrow" a copy of the alignment summary of the original (if it has one). This matters a lot for ratefree.cpp and +R free rate models, and modelmixture.cpp! The temporary copies of the phylo tree that are used during parameter Optimization can now re-use the AlignmentSummary of the original; which means they can "linearise" their memory access to sites, when they are Optimising branch lengths (see changes listed below, e.g. #4, #5, #6, #7). 3. PhyloTree::setAlignment does its "name check" a different way (rather than finding each sequence by name by scanning the tree, if asks MTree::getMapOfTaxonNameToNode for a name to leaf node map, and checks the array of sequence names against the map (updating the id on the node for each hit). The new approach is equivalent but is much faster, O(n.ln(n)) rather than O(n^2). This speeds up tree loads markedly (particularly for large trees), but it matters most for free rate parameter optimization (on middling inputs this was a significant factor: about ~10% of parameter optimization time). This can be turned off by changing the FAST_NAME_CHECK symbol. 4. IQTree::optimizeModelParameters now calls prepareToComputeDistances() (So that AlignmentSummary's matrix of converted sequences) will be available (to be borrowed, via calls to PhyloTree::copyPhyloTree (see change # 2 above, and changes #5 through #7 below). Likewise IQTree::doNNISearch (so changes #5, #8 help tree searches too). 5. AlignmentPairwise::computeFunction and AlignmentPairwise::computeFuncDerv( can now make use of AlignmentSummary's "Matrix of converted sequences" (if it is available) via PhyloTree's accessor methods, e.g. PhyloTree::getConvertedSequenceByNumber(). For this to work as expected, it's necessary for callers to ask AlignmentSummary to construct that matrix *including* even sites where there is no variety at all (added the keepBoringSites parameter on the AlignmentSummary constructor for this). 6. RateMeyerDiscrete::computeFunction and RateMeyerDiscrete::computeFuncDerv likewise. And RateMeyerDiscrete::normalizeRates can make use of the "flat" frequency array exposed by PhyloTree::getConvertedSequenceFrequencies() too. 7. PhyloTree::computePartialLikelihoodGenericSIMD (in phylokernelnew.h) makes use of the matrix of converted sequences (if one is available), in about six (!) different places. In terms of actual effect, this is the most important change in this commit, but it needs changes #1, #2, and #4 committed too, if it is to have any effect. This change speeds up both parameter optimisation and tree searching significantly. 8. As well as inv_eigenvectors, there is now an iv_eigenvectors_transposed (Using the transpose makes for some faster multiplications; see change #9 listed below). ModelMarkov::calculateSquareMatrixTranspose is used to calculate the transpose of the inverse eigen vectors. Unpleasant consequence: ModelMarkov::update_eigen_pointers has to take an extra parameter. Keeping this additional member set correctly is the only Thing that forced changes to modelpomomixture.cpp (and .h), modelset.cpp, and modelsubst.h. 9. ModelMarkov::computeTransMatrix and ModelMarkov::computeTransDerv now use (a) calculateExponentOfScalarMultiply and (b) aTimesDiagonalBTimesTransposeOfC to calculate transition matrices (This is quite a bit faster than the Eigen code, since it doesn't bother to construct the diagonal matrix B.asDiagonal()...). (a) and (b) and the supporting functions, calculateHadamardProduct And dotProduct, are (for now) members of ModelMarkov. 10.Minor tweaks to vector processing code in phylokernelnew.h: (a) dotProductVec hand-unrolled treatment of the V array; (b) dotProductPairAdd treated the last item (in A and B) as the special case, when handling an odd number of items. Possibly the treatment of the AD and BD arrays should be hand-unrolled here, too, but I haven't tried that yet. (c) dotProductTriple (checking for odd uses & rather than %) (faster!) 11.The aligned_free free function (from phylotree.h ?!) does the "pointer Null?" check itself, and (because it takes a T*& rather than a T*), can itself set the pointer to nullptr. This means that client code that used to go... if (x) { aligned_free(x); x=NULL; } ... can now be simplified to just... aligned_free(x); 12.Next to it (in phylotree.h), there is now an ensure_aligned_allocated method. That lets you replace code like ... this: if (!eigenvalues) eigenvalues = aligned_alloc<double>(num_states); With: ensure_aligned_allocated(eigenvalues, num_states); which is, I reckon, more readable. 13.In many places where there was code of the form... if (x) { delete x; } I have replaced it with delete x (likewise delete [] x). delete always checks for null (it's required to, that's in the C++ standards), and "Rolling your own check" merely devalues the check that delete will later do! I've made similar "don't bother to check for null" changes in some other files, that I haven't included in this commit (since there aren't any *material* changes to anything in those files).
1. computeMLDistances no longer writes a distance file (it was usually written *again* in computeBioNJ; see change #2). 2. runTreeConstruction can no longer assume that the distance file has been written by computeMLDistances, so (if iqtree->computeBioNJ has not been called, it must write it, even if params.user_file was false, via a call to iqtree->printDistanceFile). 3. PhyloTree now has a num_packets member (which tracks, how many packets to divide work into: it can be the same as num_threads, but is generally more; at present by a factor of 2). Member functions such as getBufferPartialLhSize must allocate per packet rather than per thread. See in particular changes #9, #10 and #11. 4. Removed a little commented-out code from PhyloTree.cpp (And moved for-loop iteration variables that could've been in-loop, but weren't in-loop, in lots of places). (Likewise in phylotreesse.cpp). 5. Removed redundant assignments to nullptr (particularly in PhyloTree::deleteAllPartialLh); these aren't needed now Because aligned_free sets the pointer to nullptr for you. 6. Client code that set IQTree::num_threads directly now does so via setNumThreads (e.g. in phylotesting.cpp) (Also in PhyloTree::optimizePatternRates) (because setNumThreads also sets num_packets). For now, num_packets is set to 2*num_threads (see change #9). 7. Removed dead pointer adjustments in the "any size" case in PhyloTree::computePartialParsimonyFastSIMD. These had been left over from before that member function was vectorised (The pointers are recalculated at the start of the next Iteration of the loop, so adjusting them is a waste of time). (Hopefully the compiler was optimizing the adjustments away). 8. Fully unrolled the size 4 case in productVecMat (In phylokernelnew.h). 9. computeBounds chooses sizes for blocks of work (Based on the number of packets of work as well as the number of threads to be allocated). For now, it is assumed that the number of packets of work is divisible by the number of threads. 10. PhyloTree::computeTraversalInfo calculates buffer sizes Required in terms of num_packets rather than num_threads. 11. #pragma omp parallel for ... and corresponding for loops are now for packets of work not threads. (a) PhyloTree::computeTraversalInfo (b) PhyloTree::computeLikelihoodDervGenericSIMD (*) (Two separate #pragma omp parallel for blocks) (c) PhyloTree::computeLikelihoodBranchGenericSIMD (*) (d) PhyloTree::computeLikelihoodFromBufferGenericSIMD (*) (e) PhyloTree::computeLikelihoodDervMixlenGenericSIMD (*) (f) PhyloTree::computeNonrevLikelihoodDervGenericSIMD (*) (Two separate #pragma omp parallel for blocks) (g) PhyloTree::computeNonrevLikelihoodBranchGenericSIMD (*) (Two separate #pragma omp parallel for blocks) The ones marked with (*) now use reductions (aimed at double) where possible, rather than #omp critical section. I've got rid of the private(pin,i,c) stuff by declaring Those variables local to the loops that use them. (This means doing horizontal_add per-packet rather than after all the packets are processed). They all use dynamic (rather than static) scheduling.
was necessary (see #2 through #8 and particularly #5 below), and also drafted some additional "progress-reporting" (see #9 through #11): 1. If -mlnj-only is found on the command-line, Params::compute_ml_tree_only will be set to true (in parseArg(), in utils/tools.cpp). 2. initializeParams doesn't call computeInitialTree if compute_ml_tree_only is set to true. 3. You can't set the root of a tree (if you don't yet have one), a bit later in the same function (and also in IQTree::initSettings). 4. Added PhyloTree::ensureNumberOfThreadsIsSet (and updated repetitive code that was doing what it does, in several other places). This forced some updates in other files, such as main/phylotesting.cpp. 5. Added PhyloTree::ensureModelParametersAreSet (as the same steps need to be carried out somewhat later if there isn't an initial tree before ML distances are calculated). It returns a tree string. 6. In runTreeConstruction, when compute_ml_tree_only is set, negative branches are resolved, and #4 and #5 are called only AFTER the tree has been constructed. 7. In IQTree::initCandidateTreeSet the tree mightn't be a parsimony tree (I think if you've combined -nt AUTO and --mlnj-only) as such, but there will be *a* tree. The list of cases wasn't exhaustive any more. 8. Added a distanceFileWritten member variable and a getDistanceFileWritten Member function to PhyloTree. 9. (This and the following changes are progress reporting changes). Added member functions for progress reporting to PhyloTree: (a) initProgress (pushes where you are on a stack, and starts reporting progress, if there's now one level of progress reporting on the stack) (b) trackProgress (bumps up progress if progress stack depth is: 1) (c) hideProgress (called before you write log messages to cut) (d) showProgress (called again after) (e) doneProgress (pops, and stops reporting progress, if the last level of progress reporting was just popped) The supporting member variables are progressStackDepth and progress. 9. IQTree::optimizeNNI uses the functions added in change #9 to report Progress (problem here is that MAXSTEPS is a rather "high" guess (For n sequences it is ~2n, when the best guess for how many iterations There will be, with parallel NNIs, is on the order of ~p where p is the worst-case "tip-to-tip" path length of the tree - probably a lot less. 10.PhyloTree::testNumThreads also uses the functions added in change#9 to Report how many threads it has tried (though, for now, it badly over-reports how long it thinks it will take) (because it thinks it will do max_procs iterations and each will take as long as the last, but, Really, it'll do max_procs/2, or so, and they go faster and faster as there are more threads in use in later steps - one more each step). 11.PhyloTree::optimizeAllBranches reports progress (via the functions added in change#9). Normally it reports progress during parameter optimisation (because I haven't written "higher-level" progress reporting for that yet). There are some potential issues though: 1. The special-case code for dealing with "+I+G" rates doesn't yet have a counterpart when compute_ml_tree_only is set (in runTreeConstruction). 2. Likewise, the code for when (params.lmap_num_quartets >= 0) (No counterpart when compute_ml_tree_only is set, yet) (this too is in runTreeConstruction). (I haven't figured out how to test the "counterpart" versions of those yet, which is why I haven't written them) 3. If you pass -nt AUTO I'm not sure how many threads the NJ (or whatever) step will use (I think it's all of them), and the ML distance calculations also "use all the threads" (because the thread count's not set when that code runs either). Both parallelise... well... but I'm not so sure it's a good idea that it hogs all the CPU cores like that.
@NicolaDM There are conflicts in modelunrest.cpp and .h. Please have a look above. Do you remember what you changed? (so it helps to review the merge). |
Sorry Minh, I don't remember much about this. Looking at my commits it looks like I added a function writeInfo() to both .cpp and .h files, and I modified |
@trongnhanuit Am I right that you change ModelUnrest to read the parameters? It looks like both you and Nicola changed this part of the class, causing conflicts... |
Hi Minh,
Yes, I added some code to read the model parameters for Unrest.
…On Mon, 6 May 2024 at 10:16 PM, Bui Quang Minh ***@***.***> wrote:
@trongnhanuit <https://github.com/trongnhanuit> Am I right that you
change ModelUnrest to read the parameters? It looks like both you and
Nicola changed this part of the class, causing conflicts...
—
Reply to this email directly, view it on GitHub
<#9 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZPMLEVRISN5GTTAKRYZSTZA5YBJAVCNFSM4RHOGJV2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZGU4DQMJTGQ4Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
author CD <cuongbb@live.com> 1514046707 +0700 committer Thomas Wong <thomaskf@gmail.com> 1740626153 +1100 correcting mt matrices Shorten PoMo usages release version 1.6.0 Fix segfault in kernel when number of states is odd due to mem unalignment (reported by a web user) Fix wrong msg about one state is observed in alignment release version 1.6.1 release 1.6.1 again (fix a mem bug introduced in commit b2409ad9a7ae) (thanks Pablo Vinuesa) Allow to read non-reversible protein matrix from file. Auto-detected if first entry is negative. refactor protein model definition to make code cleaner refactor buitin_mixmodels_definition using C++11 raw string add ReadParametersString so that non-reversible protein matrix can be loaded Obey user NEWICK tree as rooted if root node has degree of 2 new operation to optimize root position by traversing all branches for non-reversible model Fix negative frequency with +FMK, +FRY, +FWS using scaling instead of diff move optimizeRootPosition into doNNISearch Added assertion to forceFreqsConform update the latest vectorclass 1.30 to resolve compilation error with apple clang --write-branches option to write to file .branches.csv the branch lengths of partition trees corresponding to super tree (requested by Rob Lanfear) Fix output of .branches.csv Fix thread-unsafe model_info with merging models step (reported by Marianne and Karen). Start v1.6.1.a Fix parsing -net option (reported by cameron.weadick) Fix parsing -net option (reported by cameron.weadick) and error message with -fast -bb (reported by Remi Denise) update the latest vectorclass 1.30 to resolve compilation error with apple clang --write-branches option to write to file .branches.csv the branch lengths of partition trees corresponding to super tree (requested by Rob Lanfear) Fix output of .branches.csv Fix thread-unsafe model_info with merging models step (reported by Marianne and Karen). Start v1.6.1.a towards supporting list of outgroup sequences via -o option patch v1.6.1.b which supports multiple outgroup taxa via -o option (requested by Andrew Roger) towards fixing refineBootTrees() Fix bug in parsing +P causing infinite loop (reported by Craig Herbold and others) release version 1.6.2 Fix assertion iqtree->root in ModelFinder with partitions when supplying user tree (-t) Fix bug "#leaves and taxa_set do not match" when user tree is rooted for partition model (reported by Dieter Waechter) Fix assertion iqtree->root partition merging for -sp (edged-unlinked) and -mtree options check for distinct partition names Fix crash with -z when input tree is rooted Change 'Too many iterations in tqli' to WARNING instead of abort New -ntmax option to specify max number of threads by -nt AUTO (default: no. CPU scores) Fix LG4M and LG4X frequencies not summing up to 1 Fix wrong abort with -nt AUTO -mtree (reported by Mark Miller) minor patch v1.6.2.a Fix bug -bnni with -spp by disallowing NNI5. Allow bootstrap with heterotachy +H model Change ABNORMAL_TERMINATION_IN_LNSRCH to warning Parallelise SH-aLRT test (problem identified by Heiner Kuhl) Fix ModelFinder to support mixed data types (reported by Stephen Baca) save checkpoint for part_tree Change Numerical underflow for lh-branch to WARNING and arbitrarily fix tree_lh (reported by Karen Siu Ting) Add GTR model for morphological data (requested by Sergio Andrés Muñoz Gomez) New option -j to perform jackknife (requested by Emmanuel Toussaint) Fix mixed data in ModelFinder with proper partition model Fix compilation with MinGW release version 1.6.3 Fix compilation for gcc 4.8 New option --runs to perform multiple independent runs (requested by David Maddison) Fix saving checkpoint for candidateTrees by -n 0 beta version 1.6.3.a with --runs option fix bug parsing +P pomo model string (reported by Carolin) Bugfix, interpretation of +P. Minh, please have a short look at the TODO Minh items. The +P flag was not found when it was at the first position of the string. This led to errors, which are fixed now. However, you might have had a reason to start search for +P only at the second position in the given model name. refactor PartitionInfo and move partition names etc. to Alignment class in terms of CharSet Fix compile error refactor partition name, model_name into Alignment class. New runModelFinder() decoupled from runTreeReconstruction Fix crash --runs with -b or -bb Fix printing .contree for --runs option report multiple runs to .iqtree file Fix issue #66 mixture model with GTR20 wrongly assign empirical AA (reported by Dominik Schrempf) Fix -wbt option with .ufboot file containing partial support values by combining -bb and -alrt (reported by Guifre Torruella Cortes) Fix bug with +ASC when some state has very low frequency by removing them from unobservable constant patterns (reported by paul madeira) Reduce upper bound of omega and kappa from 100 to 50 due to numerical instability (reported by aleksas lab) Clarify branch lengths of PoMo in .iqtree file. beta version 1.6.3.b Fix error message with number of threads (reported by David Maddison) Fix crash reading DNA model from file (reported by Benjamin Redelings) Fix crash when reading rooted trees with identical taxa (reported by web user cjp1043) overwrite .ufboot and .splits.nex of the best run disallow --runs and -z release version 1.6.4 New option --show-lh to compute log-likelihood without any optimisation (requested by Benjamin Redelings) release version 1.6.4. Allow specifying 6 GTR rates and G-T to be zero (requested by Ben Redelings). Refactor PartitionInfo Towards allowing unlinked partition trees. New class PhyloSuperTreeUnlinked. Fix bug GTR20 = LG! New option -spu for tree-unlinked partition Model. Add NONREV protein model. Warning about underflow for non-rev lh-branch instead of ASSERT. New option --link-model to link substitution model between partitions Fix compile issue in Xcode using Home-brew clang Fix thread unsafety for non-rev kernel for matexp when using temp_space Bugfix: GTR20 rates initialised with LG but not optimised further Let ModelProtein decide its own state frequency type Print ERROR when user tree taxa mismatch alignment instead of ASSERT (reported by pablo.vargas) consistently rename taxa in tree file as done when reading alignment consistently rename taxa in SETS nexus block or cluster file Bugfix: part_info wrongly transferred from boot_tree to tree (-b option) with -m TEST (reported by paul.madeira) release version 1.6.5 ModelMarkov class: Use Eigen3 lib for decomposition of non-reversible models Use Eigen3 for reversible model eigen decomposition. Much faster: New option --root-move-dist for max move distance of root (default: 2). convertToRooted(): move root to middle of longest branch. properly report linked model and unlinked tree Properly initialise linked model with average state frequency across partitions Fixbug forgot to renormalise sum_state_freq Fix initialising state_freq for non-rev model and wrong length of newly moved root in longest branch switch to scaling-squaring if eigen-decomposition is unstable (instead of ASSERT) New option "--model-init" to initialise general protein model (GTR20 or NONREV) from a model file Change default to L-BFGS-B optimization for linked model. New option "--loop-model" for number of optimization iterations. "--init-model" instead of --model-init fix calling optimize...GammaInvar by condensing redundant function reversible eigen3 decomposition: resolve zero-frequency and sum not equal to 1 Fix optimisation of general NONREV (and UNREST) model with FREQ_ESTIMATE --init-model DIVMAT to initialise rate matrix from divergence matrix New option --sym-test to perform Symmetry Test as of Lars Jermiin et al. Use Eigen3 for doSymTest New option --symtest-keep-zero to retain NAs and --symtest-pval to set p-value cutoff New option --symtest-remove-bad, --symtest-remove-good to remove bad/good partitions. New option --symtest-type SYM|MAR|INT to set type of symmetry test used to remove partitions. Add usages for tests of symmetry fix isnan compile error Fix compile issue with older gcc/clang with modelmarkov.cpp boost library added. Optionally compute binomial p-value using boost Fix issue with internal tree node names changed from slash to underscore (reported by JP Flandrois) Fix ASSERT with protein mixture models (reported by Giddy Landan) Don't allow --runs with -lmap reverting printing number of all unique quartets Bug reading nexus composite characters for non-DNA data (reported by a web user) support amino-acid O as the 22nd amino-acid (reported by Cuong) Fix bug initFromCatMinusOne() creating negative rates in edge cases (reported by Paul Frandsen) beta version 1.6.6.a Fix mem leak in reading nexus alignment Report error in read trees (-z) instead of assertion in convertToUnrooted() or convertToRooted() (reported by a web user) Fix -bsam with standard bootstrap. Print more assert info in RateFree::optimizeWithEM. Handle IQTREE_FLAGS of nostrip to cmake. Support jackknife with GENE and GENESITE resampling. Shorten code printing "jackknife" or "bootstrap". Fix bug standard bootstrap with -m ...MERGE (reported by Guoqing Li) Wrong error with codon models and -t RANDOM (reported by Karen Meusemann) Error about blank chars in partition file (reported by David Maddison) beta version 1.6.6.b remove unused variables in computeLikelihoodDervMixlenSIMD Update libomp.a for Mac. The old one caused freezing when running a long time (reported by David Maddison) add libomp.a for Linux, compiled from OpenMP lib LLVM 6.0.0 Relax assertion about Logl when one NNI applied due to lower numerical precision with codon models (possibly fix Karen's bug report) Error msg when using EM algorithm combining mixture (e.g. C10) with GHOST (+H) model beta version 1.6.6.c release version 1.6.6 IMPORTANT: Fix bug ModelMixture checkpoint restoring wrong prop (reported by Juergen Strassert). As result checkpoint entry is changed. More error checking with CKP_ARRAY_RESTORE Fix checkpointing of ModelLieMarkov and conform with CKP_ARRAY_SAVE beta version 1.6.7.a Bug introduced in v1.6.6 in extractSiteID: incomplete parsing of position range at the first blank char (reported by Alexey Kozlov) allowing ambiguous states from NEXUS file again (was not properly removed during the merge from 1.5 to v1.6 (reported by Steven Heritage) Bug cause using only 1 thread in assigning num_threads when #partitions < #cores (reported by Alexey Kozlov) Reduce RAM usage for -bnni option (reported by Juergen Strassert) Temporarily disable AU test due to unresolved numerical issue print ERROR instead of ABORT when first taxon does not exist in user tree file (reported by a web user newseed97) ERROR when reading nexus file with invalid MATRIX command (reported by a web user) Introduce bounds for optimizeTreeLengthScaling, in extreme cases where branch lengths may get too long (reported by web user vikaszsi77) Do not include ORDERED into default MORPH model selection due to frequent numerical underflow (reported by web user) Bug in v1.6 with MERGE in concatenateAlignments with missing data (reported by David Dunchene) print error message instead of crash when using -bsam with non-partition model (reported by Sarah Jensen) release version 1.6.7 cmakelist update New option --save-mem-buffer to save traversal_info buffer memory set STT_USER_TREE for -te option to avoid initializePLL reduce RAM usage for -spu New option --no-seq-comp to disable sequence composition test Fix buffer allocation and FMA non rev kernel assignment assign AVX non rev kennel for protein Bug in branch direction converting back and forth between rooted and unrooted tree Reset nondiagonalizable after matrix exponentiation Rewrite computeStateFreqFromQMatrix to use Eigen3 lib to avoid segfault. Option --model-epsilon for -me comment out matexp Avoid converting (un)rooted tree twice for ModelProtein Add SuperAlignmentUnlinked class for completely unlinked partition alignments Speed up loading time for big data with -spu Do not build taxa_set for -spu to save space Fake one Pattern for SuperAlignmentUnlinked to save space Show more precision in log likelihood with --show-lh This makes it possible for testiphy to test IQ-Tree correctly. Its possible that a precision of 16 gives all the digits, but with rounding and such its possible you need 17. Only increase precision for log-lh if VB_DEBUG or higher. merge master into latest Allow tree-search for unlinked gene trees Allow ModelFinder for -spu unlinked partitions Reimplement -tina and -st MULTI option merge and resolve conflicts with master Allow overlapping taxon sets for -spu Parallelise doTreeSearch for PhyloSuperTreeUnlinked Crash --runs with partition model (reported by David Maddison) new function setNumThreads decoupled from setLikelihoodKernel setNumThreads reduces #threads for very short alignment patch version 1.6.7.1 merge with master Integrate Booster - transfer bootstrap expectation (TBE) (code from https://github.com/evolbioinfo/booster) Add booster folder usage for --tbe Make sure that ASSERT does do not real work merge master into latest Add .DS_STORE to .gitignore. Integrate Terraphast as an optional module. Terraces - improve output text. Set USE_TERRAPHAST to ON by default. Rename terraces to terraphast in CMakeLists. Merge master into feature-terraphast Change USE_BOOTSTER to option make -sup to assign supports for trees with un-equal taxon sets use map for fast lookup taxon name in computeRFDist make terraphast compatible with C++11 update terraphast for c++11 support print best model parameters into .best_model.nex file Fix compile error with old GCC Fix setStateFrequency rescale custom substitution rates to a max of 10 for protein model Usage for -spu, -sup2, -rf2 options Beta version 1.7-beta rename booster crc32 function to resolve conflict with zlib New option --estimate-model to estimate model params for tree evaluation (-z). -optfromgiven works with free rate model Properly disable writing .log file for PhyloSuperTreeUnlinked::doTreeSearch. Introduce cmust to must write something to cout and log file checkpointing while estimating linked model across partitions New option --mlrate to compute site rates by ML (requested by Alex Dornburg and Jeff Townsend) fix initLeafSiteParsForAmbiguousState. refactor convertState to conform with StateType move Sankoff parsimony functions from ParsTree to PhyloTree new option --mpcost to input a cost matrix file for parsimony Fix edge-case bug in computePartialParsimonyFastSIMD when setting dummy states merge master into latest Move all Sankoff parsimony to phylotreepars.cpp. Fix SIMD_BITS for AVX512 -mset option accept string with prefix + to add a model into existing list Fix bug mixing rooted and unrooted trees for -spu (reported by Cuong Dang) beta version 1.7-beta2 new option --site-concordance to compute site concordance factor (Matt Hahn and Rob Lanfear) expand the parameter bounds for linked model Allow partition_file to be a directory and read all alignment files within the directory Revert getBitsBlockSize for SIMD_BITS bug. setParsimonyKernel for PhyloSuperTree for properly set kernel change --symtest to --bisymtest revert main.cpp in ordering of calling runPhyloAnalysis Do midpoint rooting for convertToRooted. New option --root-find to find the root position for model parameter optimisation. --support as -sup2. Print branch supports to .support file Fix printing ModelProtein::getNameParams New option --model-joint to enforce joint model name for all partitions Some debugging output. Update Terraphast. Add SSE_FLAGS to terrace, terraphast and booster Fix tip_partial_lh_size mem alignment for AVX512 (PhyloSuperTreePlen::initializeAllPartialLh) rescale down parsimony branch length due to over-estimation. TODO: more proper with parsimony ancestral reconstruction properly compute parsimony branch lengths by ancestral construction. -bl-eval default to 1 (instead of 2) thanks to better parsimony branch length Use F81 correction instead of JC and Gamma model if available Make Sankoff algorithm aware of tip nodes handle ambiguous states and cleanup sankoff parsimony reorder loop for multifurcating node SIMD version for Sankoff parsimony resolve multifurcation of constraint tree efficiently Fix ASSERT in getNeiBranches merge resolveMultifurcationParsimony and refine computeParsimonyTree further refining computeParsimonyTree Facility to add branch attributes Enable parallel computation of parsimony trees with thread-safe Fix computePartialParsimonyFast crash for rooted tree Fix crash consensus tree construction with rooted trees beta version v1.7-beta3 Fix crash with -sp (edge-unlinked partition model) for rooted PhyloSuperTree and non-reversible model Fix -spp with rooted PhyloSuperTreePlen Fix crash mixing REV and NONREV models in partition model newNeighbor() as elegant way to allocating new Neighbor (and sub-class) pointer Put msg "Converting rooted to unrooted" to medium verbose mode Crash with -spu for rooted trees Crash in initCandidateTreeSet with rooted tree by own parsimony kernel convert rooted constraint tree into unrooted with a warning instead of error Fix crash with -sp and -g Fix -tina option usage for --mlrate decouple testPartitionModel from testModel New option --modelomatic to find best codon, AA or DNA models (Whelan et al. 2015) compute the adjusted degree of freedom for --modelomatic -mset modelomatic to set the same models as ModelOMatic for comparison purpose --modelomatic works with partition model now add StateSpace class and yaml-cpp library refine class StateSpace start phyloYAML library create a namespace PML for Phylogenetic Markup Language based on YAML StateSpace data conversion added Refactor assignBranchSupport. Rename option --gcf, --scf Remove BranchSupportInfo and use branch attributes instead Fix UFBoot with -spu unlinked partitions Fix SH-aLRT test with -spu unlinked partition trees Fix printing ufboot trees (-wbt) with -spu New option --bisymtest-stat to print pairwise statistics of tests of symmetry (requested by Suha Naser) new option --permsymtest NUM to do permutation test of symmetry of Eric Stone (requested by Suha Naser) parallelize symtest computation print the null distribution of permutation test of symmetry add travis CI config Fix crash with mixed data type or partitions with non-dna/prot data (reported by Paul Frandsen) patch version 1.6.7.2 Towards fixing AU problem, break the for step loop when many BPs are 0 or 1 For tree testing: include original alignment into the bootstrap samples Beta version 1.7-beta4 Fix crash in model testing when partitions <= 2 (reported by Michele Leocadio) AU test now uses MLE p-value estimate ModelMarkov read non-reversible model file properly now. Change error message "A taxon has no name" to "Redundant double-bracket ‘((…))’ with closing bracket ending at" release version 1.6.8 Adjust logl and df for --modelomatic properly Introduce -p option for -spp. New option -S for tree-unlinked inference. Introduce --option syntax and adjust usage Refactor -j / -J jackknife option and refine usages Further refinement of help usage Beta version 1.7-beta5 safe-check state_freq < MIN_FREQUENCY when optimising NONREV model parameters cosmetic changes in terraphast (code formatting) update terraphast - make the compilation Visual C++ compatible - make all integer operations architecture-agnostic (32/64bit) setup travis update travis update README build and license countPhysicalCPUCores uses built-in function safe-guard nondiagonalizable matrix compute sCF with super-alignment and refinement Refactor --gcf option to take --tree FILE Further refine cf.* outputs Fix computing sCF for super-tree compute gene discordance factor. totally using branch attributes New option --scf-part to print concordant/discordant sites for branch and partition. This helps to find g50 statistic. Fix sCF for .sf.tree file. Change BranID column label to ID Refactor --gcf option to take a trees file. --help usage for concordance factor analysis. beta version 1.7-beta6 Fix --scf for rooted input tree (reported by Aaron Liston). syncRooting exit if no gene trees. MTree print no branch length if length = default -1.0 Properly loading of rooting tree in MTree and MTreeSet Print NA to cf.stat_locus file if locus is not decisive for the branch (requested by Rob Lanfear). --cf-verbose (instead of --scf-part) to print cf.stat_locus and cf.stat_tree file. .cf output beautified parallel sCF computation --bisymtest-remove-bad[-good] will write .good.nex or .bad.nex partition file (requested by Renee Catullo) beta version 1.7-beta7 Fix -asr when some state(s) are absent in computeTransMatrix by switching to matrix computation with Eigen3 lib ModelLieMarkov now uses general-purpose eigen decomposition and matrix exponentiation from ModelMarkov print concatenated alignment for .good.nex and .bad.nex (--bisymtest-remove-bad/-good) More strict error checking for MSetsBlock due to misspecified comma and semi-colon (reported by Denis Jacob Machado) safer writing of checkpoint file for very large analysis by a .tmp file and rename this file once writing done (reported Giuseppe Aprea) warning about failed writing checkpoint file concatenateAlignments() merges two codon partitions with different genetic codes (reported by Morgan Jackson) -rcluster-max will now activate rcluster algorithm (without having to specify -rcluster) --no-outfiles will now suppress .treefile as expected (reported by Cecile Ane) Fix --runs with -bnni (reported by Vanessa Vera Fain) Bug introduced in v1.6.6 in SuperAlignment::createBootstrapAlignment in SCALE bootstrap for AU test (reported by Ales Bucek) release version 1.6.9 Implement safe numerical likelihood kernel for nonreversible models (requested by Vinh and Cuong) Fix decomposeRateMatrix with Eigen3 lib for MG-type model by checking ignore_state_freq (reported by Nathanael Walker Hale) -nt AUTO works with new -S option now Fix crash with -o option for the new -S option (reported by Dan Vanderpool) Fix assertion !ordered_pattern.empty() in fixNegativeBranch when there are no variant sites in the alignment (reported by Dan Vanderpool and Samuel Church) beta version 1.7-beta8 Fix compiling issue with OpenMP lib linking Fix crash with edge-unlinked (-sp) and -bnni option (reported by Mark Miller) merge master into latest New --maxsymtest option as synonym to --permsymtest Do not install yaml-cpp lib Less stringent ASSERT on symmetry of Q (reported by Bradley Martin). Warning instead of Error on numerical underflow. beta version 1.7-beta9 release version 1.6.10 cleaup MPI code Support -nt AUTO for MPI version: detect if processes are running on the same host resolve conflict with master Fix merge issue after MPI cleanup automatically increase checkpoint interval to be at least 20 times dumping time For edge-unlinked partition model (-sp) partition trees with unlinked branch lengths will be printed to .parttrees file (requested by Stephen Crotty) Better handling zero state_freq for nonrev models. New option --min-freq to control minimum state frequency (default 1e-4). use modelmarkov decomposeratematrix Fix ModelPoMo to handle rates instead of rate_matrix. Take into account min_gamma_shape passed via --alpha-min option. Fix +FO estimation for ModelDNA Allow to compile OpenMP version with apple-clang (at least Mac OS High Sierra 10.13). Make boost library required by default. remove unused paramter comment, todo Allow Xcode indexing with compiling again update yaml-cpp master branch bd7f8c6 add rest of yaml-cpp change BUILD_SHARED_LIBS from variable to option Fix --model-joint with -spp (edge-linked partition model) Fix problem GTR20+FO optimisation. Redundant double optimisation in optimizeLinkedModel Towards fixing -spp with --model-joint New option --df-tree to print tree for gDF1 into cf.stat file (requested by Dan Vanderpool) Beta version 1.7-beta10 New option --root-test to test all root positions and print .rooted_trees --root-test can be combined now with -zb to perform tree topology test and report to .iqtree file change default zlib compression level from 9 to 1 for speed Comment out PartitionModelPlen::getNDim that distort --model-joint option adjust computing df properly for linked models adjust .iqtree report file for linked/unlinked models properly Return reduced memory requirement for ModelFinder Bug fix in decomposeRateMatrix with Eigen3 lib when some state_freq is nearly zero Same fix with state_freq nearly zero for non-reversible model Beta version 1.7-beta11 -ao (--out-alignment) will print partition file if applicable Bug fix: Reimplement --link-model and --model-joint to be memory thread safe checkpoint linked_models and omit checkpointing individual models. remove Warning about weird state_freq Properly reporting joint model to .iqtree file --model-joint works now without -m Checkpointing for ModelUnrest Move fixed_parameters to ModelSubst Properly optimising linked +FO model shorten .iqtree report for linked models assign joint empirical state_freq as for single model avoid repeatedly printing linked ModelMarkov model parameters in .iqtree report refactor state_counts from unsigned to size_t to avoid potential overflow fix assertion isRootLeaf(dad) in computeNonrevLikelihoodDervSIMD, because that can happen for partition model with missing data, when the whole subclade below the root is missing custom output for linked_models Beta version 1.7-beta12 Relax ASSERT for rooted branch lengths. PartitionModelPlen::optimizeParameters does not optimise branch lengths at the very beginning numerics with non rev lh-derivative IMPORTANT: change gcc optimisation level to -O2 (instead of -O3), which caused numerical issues refactor MIN_FREQUENCY to obey --min-freq option add Xpreprocessor to CMakeLists.txt so that apple-clang can compile openmp IMPORTANT: change gcc optimisation level to -O2 (instead of -O3), which caused numerical issues Fix bug -bnni with -bsam option (reported by Emmanuel Cantu) Support '.' for alignment length in partition file (requested by Teofil Nakov) Fix setting part_info for -bsam with -bnni Fix -bnni with partition models and GENE resampling Fix WARNING about bootstrap convergence for -bnni option (reported by Max_IT) Fix crash after merge IMPORTANT: amino-acid frequencies of DAYHOFF, DCMUT, JTT, MTREV, WAG, CPREV, LG, MTZOA, PMB, JTTDCMUT, FLU are slightly modified with higher precision and normalised to 1. Fix numerical issue with likelihood scaling for gcc with -O3 by using ldexp Fix ModelDNA::setRateType() to account for fixed substitution rates (reported by @bbuchfink) WARNING when constraint tree is comprehensive (has all taxa and bifurcating) only set number iterations for -fast if not set before Fix compile error with clang under linux refine JTT freqs with higher precision change isnan / isinf check to !isfinite for likelihood kernel release version 1.6.11 Reimplement --symtest considering sequence pair with max divergence (requested by David Bryant) Fix bug in computing ML distances (.mldist) for +I or +R models Fix an edge case with -o option when the outgroup is identical to another sequence (reported by Yang Song) Crash with model selection of mixed datatype with -spp (reported by Bernardo Santos) Update README.md with download stats Fix README.md with download link New option --symtest-only to do SymTest then exist Beta version 1.7-beta13 refactor unobserved_ptns to store vector of Pattern for more general +ASC model +HOLDER ascertainment bias correction works now Newton-Raphson branch length optimisation works now with Holder's ASC Introduce ASCType to make the code cleaner and to better distinguish between ASC models Naive implementation of +ASC_INF for informative sites ASC correction Refactor printPhylip to allow printing informative sites to .infsites.phy file minor cleanup Report error with -sup when target tree is not as "rooted/unrooted" as tree set (reported by Christian Rinke) Fix crash with -nt (openmp) in clang+windows due to compiler issue -Xpreprocessor (reported by Wo niu) Fix crash with -wbt and non-rev models Beta version 1.7-beta14 New option --fast-merge to test only one model (GTR for DNA, LG for protein) during the PartitionFinder merging phase New option --merge-kmeans to do "one shot" kmeans clustering of partition rates for PartitionFinder do not restore best_model at the beginning of testModel(), so that the loop can determine new best model in a new set beta version 1.7-beta15 --merge-kmeans will do the one-shot k-means clustering now Fix k-means clustering New option --merge-log-rate for rcluster algorithm on log-transform of treelen transfer model parameters and tree lengths during partition merging Important change: optimizeTreeLengthScaling() now considers wider lower-upper bound New option --tree-dist1 to compute tree distances between k-th and k-th tree of two sets New option --normalize-dist to normalize tree distance into [0,1] New option --merge-fastest to only consider either +G (no const sites) or +I+G (with const sites) rate model New option -rf1 to compute Robinson-Foulds distance between corresponding k-th trees of two tree sets start version 1.7-beta16 New option --out-csv to print .rfdist file in csv format Option -T, --threads to set number of threads (synonym to -nt) Only abort in optimizeModelParameters if not TOPO_UNLINKED Remove L-BFGS-B by default due to numerical issue (NAN) (thanks Rob Lanfear) Refactor options for PF2. New --merge, --merge-model, --merge-rate options. New --edge option to set branch length linking between partitions. Fix printing help usage ModelFinder now computes fast ML tree instead of the default MP tree rcluster algorithm will now consider merging pairs with both normal and log of treelen. --merge-normal-rate to revert behaviour (no log-transform) Fix computing fast ML tree for ModelFinder2 Adjust epsilon for fast ML tree search. Transfer model parameters from ModelFinder to tree search. rescaled tree length before transfering model parameters New option --mf-epsilon to set log-likelihood epsilon for ModelFinder optimisation Only update checkpoint with parameters of the best model found iEquals(a, b) functions for case-insensitive comparison between two strings Allow to read alignment file as a comma-separated list of files, that will be concatenated into one alignment Allow -s option to read alignments in a directory, which are concatenated into one alignment remove slash in out_prefix Refactor code with class CandidateModel for more object oriented model testing Merge ModelInfo class into CandidateModel split ModelFinder and tree topology testing into two source files refactor testModel into CandidateModelSet and merge testModelOMatic Fix some typos Cleanup Refactor seq_states from Alignment to ModelSubst for thread-safety. Parallelize ModelFinder across models instead of alignment sites. reorder model names with most common model coming first Decouple subst_name and rate_name in CandidateModel Default --rates AUTO option to automatically choose the best rate heterogeneity. --rates ALL to revert the old behaviour. AUTO mode to filter out non-promising substitution models Fix -S option with computeFastMLTree. Only AUTO-selecting rates (not substitution models). Support partition file with tree length in curly bracket after the partition name. Print this information to best_model.nex file Fix compile issue in Linux beta version 1.7-beta17 fix likelihood mapping (-lmap) with refactored seq_states Remove seq_states for less error prone and thread-safety fix cout end quartet computation Fix optimizeLinkedModel by switching stable BFGS Update phylotree.cpp Fix compile issue with Win 32-bit and add SSE3 to main New option --subsample and --subsample-seed to subsample a number of partitions (for Rob Lanfear) start version 1.7-beta18 version 2.0-rc (release candidate) Fix reading NEXUS file with datatype protein Introduce wrapper Alignment::printAlignment. Accept - in charset name in nexus file release candidate 2.0-rc1 Change --subsample s.t. with same --subsample-seed the smaller subsamples are a subset of larger subsamples When alignment file and partition file are the same, one now only needs to specify partition file to speedup reading --subsample accepts negative number for complement sampling --gcf now print nexus tree file .cf.tree.nex for annotated tree best viewed in FigTree --scf also writes annotated tree and the absolute numbers for sCF_N, sDF1_N and sDF2_N Update tools.cpp Update modelprotein.cpp Update tools.cpp update travis boost lib Revert to the old Numerical Recipes eigen decomposition if Eigen3 failed start release candidate 2.0-rc2 reverting to old NR eigen-decomposition if some eigenvalues are positive Fixing issue with terrace analysis: the code is not called if there is no ml tree. The option is turned off by default. Accept '|' char in taxon names (requested by @rambaut) Fix NONREV protein model initialisation unintentionally converting rooted input tree (reported by Suha Naser) Fix copyTree for a taxa_set to work with rooted tree properly (reported by Suha Naser) Fix -t BIONJ with ModelFinder2 (reported by @kevinliam) Disallow -b and -S together add LSD2 module Add LSD2 for least square dating and new option --dating LSD dating works with outgroup (-o option), all output files to timetree.* and new option --dating-options to pass into LSD Accept / in taxon names and turned off PLL as it does not accept such names Introduce --date option (not functional yet) and move error checking in parseArg to outside the for loop New option --date to input date file and accept YYYY-MM-DD date format, but better do this directly in LSD internal interface with LSD2 instead of files adapt API with LSD2 Update citations and usage for options --date and --dating update LSD2 module update lsd2 module New option --date-root and --date-tip to provide root and tip dates (for all tips) New option --date TAXNAME to extract the dates from taxon names in "taxname|DATE" format --date-debug option to print internal files used for LSD Perform dating when identical trees were already reinserted less verbose screen for dating release version 2.0.3 support x:y date range, taxon list for MRCA and # for comments in date file Fix compilation error when -DUSE_LSD2 is not defined release candidate 1.6.11.1 release version 1.6.12 Move error message in parseArg() to outside for loop as it had no effect (reported by Panagiotis Adam) CMAKE seems to no longer support using -I in add_definitons(), must use include_directories() instead Fix compile error with -DIQTREE_FLAGS=single, thanks to @ilbiondo update lsd2 IMPORTANT: Refactor --redo option, introduce --redo-tree and document --undo IMPORTANT: Rename binary to iqtree2 Fix issue #129: MPI version hangs (thanks @poquirion) adjust CMake target for xcode 11.4 Fix thread-unsafe in inserting meanings for site-concordance (reported by Dan Vanderpool). Integrate the latest LSD2 that fixed negative root branch. Allows to compute date confidence internval with --date-ci and --clock-sd Spit out error when dates are not in right format Allow date range with missing upper and lower bound. Remove conversion of incomplete date e.g. YYYY-MM as this is now supported in LSD2 New option --date-outlier for -e option in LSD2 start unofficial release 2.0.5 Update modelprotein.cpp Update tools.cpp Fix bug with GHOST model in readTreeString (reported by Miles Zhang), which affected checkpointing and -bnni update the latest lsd2 code and adapt API accordingly Quick fix crash with -bsam GENE for standard bootstrap or -bnni (thanks Diep) Fix issue #140: Accept --mrate as printed in the help message (reported by @tseemann) Fix issue #134, -nt AUTO to -T AUTO (reported by @tseemann) fix issue #132 start version 2.0.6 link to lsd2 commit 2c6b534fd Bugfix: in --symtest, take the maximum divergence pair normalised (reported by Peter Foster) Allow frequency mixture for binary model (thanks Edward Braun) update latest lsd2 New option --cf-quartet to print sCF of all resampled quartets (requested by Benjamin Rosenzweig and Matt Hahn) change version number to 2.0.7 draft modify site concordance factor to work on multifurcating trees refactor computeTipLikelihood into each model class for seq error model Missing line in Matrix::removeRow in bionj2.cpp making its BIONJ implementation go wrong. Alignment::removeIdenticalSeq now calculates sequence hashes in parallel, So sequences only need to be compared if they have different hashes. Also: added a listSequences bool in Alignment::checkSeqName (I intend to define a parameter for suppressing the list of sequences). In utils/bionj2.cpp, added a draft of BoundingBIONJMatrix (subclass of BIONJMatrix that adapts most, but not *quite* all, of the ideas from RapidNJ). Rows of the (see the RapidNJ paper) S and I matrices are sorted via a template function, mirroredHeapsort, defined in utils/heapsort.h. Also added VectorizedBIONJMatrix which is hand-vectorized (it doesn't seem to be any faster than the un-vectorized version, which suggests the compiler is vectorizing the latter). SuperAlignment::removeIdenticalSeq now hashes sequences (in parallel) so that it can identify duplicates quicker. Alignment::removeIdenticalSeq now logs information about the time it spent hashing, and the time it spent checking for identical sequences, only if verbose level is medium or higher. Rapid NJ implementation in utils/bionj2.cpp now uses the tighter bound heuristic detailed in section 2.5 of "Inference of Large Phylogenies using Neighbour-Joining", Simonsen + Mailund + Pedersen (2011). See code that sets up and uses scaledMaxEarlierClusterTotal in BoundingMatrix. BoundingMatrix can now subclass either NJMatrix or BIONJMatrix. Did away with the "#pragma omp critical" directives in BoundingMatrix (they were hearting parallel performance). Removed some log-lining (that had already been commented out). Note that: BoundingMatrix is still not giving answers consistent with the old BioNj class or BIONJMatrix. Vectorized version of BIONJ now works (or at least seems to! Further testing is needed). VectorizedBIONJMatrix is now vectorising (it wasn't before, as I'd forgotten to update the signature of its getRowMinima() member function). Matrix constructor ensures row-starts are 64-byte aligned, even if the array allocated for the matrix data wasn't. VectorizedBIONJMatrix, moved its scratchTotals and scratchColumnNumbers so they're std::vector instances, owned by the class (so no need to allocate them over and over). Those two vectors are oversized by 64/sizeof(NJFloat) so that the pointers into them can start at a 64-byte aligned address and Vec4d::load_a can be used instead of the slower Vec4d::load. Ignore constant sites in distance calculation Fix Alignment::computeObsDist, revert change in AlignmentPairwise::AlignmentPairwise clarify on STATE_UNKNOWN for ModelPoMo. Function ModelSubst::useRevKernel to know if using reversible likelihood kernel New DNA model +E for sequencing error, e.g. GTR+E+G Move multiplyWithInvEigenvector outside of computeTipLikelihood Changed how BIONJ et cetera trees are exposed to other parts of iqtree. Now done through StartTree::Factory, defined in utils/starttree.h and implemented in utils/starttree.cpp. -starttree parameters NJ, NJ-R, NJ-V, BIONJ, BIONJ-R, BIONJ-V are "advertised" in utils/bionj.cpp (via addBioNJ2020TreeBuilders). Renamed bionj.h to bionj.cpp (since it isn't really a header). Added an adapter (BIONJ2009Adapter) to make BioNj look like one of the other tree builders. Alignment::checkSeqName() will now suppress list of sequences if --suppress-list-of-sequences parameter passed. PhyloTree::removeIdenticalSeqs will now suppress list of indentical sequences if --suppress-duplicate-sequence" parameter passed. computeInitialDist() corrected typo in logged line (mentioning Jukes-Cantor). Simplified #pragma omp stuff in bionj.cpp (just use #pragma amp parallel for). Added -experimental command-line parameter (for turning on experimental distance matrix calculation). BoundingMatrix::decideOnRowScanningOrder (used in RapidBioNJ) no longer fully sorts row minima. It partially sorts, by comparing the first half with the second (swapping those out of order), then (iteratively) sorts the first half. What matters is getting the rows that will *probably* have the lowest (minimum) Q values to the front. The relative order of the other rows is beside the point. Added PhyloTree::computeDist_Experimental (which is used if -experimental parameter is passed). 1. Alignment::summarizeSites figures out which sites matter for distance calculation (And stores the result in an AlignmentSummary). 2. A matrix (sequenceMatrix) (row per sequence, column per relevant site) is constructed (Basically a transposition of the site-major sequence-is-column data in the Alignment). 3. That is supplied to computeDistanceMatrix (which could be moved out of PhyloTree), Which uses a vectorised hammingDistance() function to calculate observed distances. The -t option needs to know what the recognized tree names are (just as -starttree does), in tools.cpp. Also mapped "" (didn't say what tree) to BIONJ. So, even if initial tree parameter isn't supplied, tree look up shouldn't result in a segfault. Implement nucleotide-specific error model of Nicola de Maio with +EA, +EC, +EG and +ET Handle +E in createModel so that it can be specified within mixture Added copyright and license notice to NJ and initial tree construction source and header files. Checks for duplicated sequence names use std::unordered_set<std::string> rather than for-loops. (In practice, this doesn't seem to much difference to performance). Refactored AlignmentPairwise (consolidating allocation in constructors, and deallocation in destructors, so that matrices can be reused), and the same AlignmentPairwise can be re-used to compute multiple branch lengths. PhyloTree.cpp now uses distanceProcessors (a vector of AlignmentPairwise*) for that purpose. None of this makes much difference to performance (yet!). But it reduces memory churn and swapping of cache lines between cores. -experimental now does some ML distance calculations, at least when rates and models are not site-specific (see AlignmentPairwise::setSequenceNumbers) using a "flat" sequenceMatrix. Provided by an AlignmentSummary instance. AlignmentSummary class now has its own .cpp and .h file. Alignment::summarizeSites... logic moved to AlignmentSummary constructor. AlignmentPairwise now does its matrix allocations "up-front", and can be re-used. hammingDistance() functions have moved to utils/hammingdistance.h, and now sum frequencies for the sites that are unknown. This slows them down a bit, but makes what they do closer to what Alignment::computeObsDist (the function that they replace) already did. PhyloTree's computeDistanceMatrix free function no longer counts unknown states (or frequencies of constant sites) towards the denominator. PhyloTree has several new member functions (And a new summary member) for setting up (and accessing) an AlignmentSummary (and its flattened sequence matrix) during distance matrix calculations. That stuff is only turned on when -experimental is true. I am still seeing the occasional crash when -experimental is *not* set, but I haven't figured out the cause of those crashes, as yet. Changes aimed at getting rid of a sometime crash in PhyloTree's destructor: Fixed up an initialization (?) issue in one of the PhyloTree constructors (it didn't call init). PhyloTree::doneComputingDistances cleans up more aggressively. -starttree UPGMA now an option. StartTree namespace class changes: Position, Positions, Link, Cluster, ClusterTree take a T parameter. NJMatrix, BIONJMatrix, BoundingMatrix and VectorizedMatrix take an additional T parameter (*first*). writeTreeFile has moved to ClusterTree from NJMatrix Added UPGMA_Matrix<T> template class (it's slightly simpler than NJMatrix, so I've Added it as a superclass of NJMatrix). But, note, its finishClustering() member function is probably not 100% valid (yet). The formulae it uses: just my best guess. PhyloAnalysis.cpp (A), PhyloTree (B), changes (to allow initial trees to be fed distance matrices At the same time that the distance file is being written). A1. computeMLDist (extra parameters so it can report wall-clock time as well CPU time). It writes a distance file (because computerist, which it calls, no longer does). It transfers ownership of distance matrices rather than copy-then-delete-original. A2. ComputeDist (align, distance-matrix, var-matrix) no longer passed a dist_file (Because it no longer writes distance files). B1. Separated out decideDistanceFilePath and printDistanceFilePath member functions. B2. PhyloTree::computeBioNJ tries to do distance matrix processing + writing distance File in parallel. Doesn't take Alignment and dist file parameters (it was always Using the current instance's alignment and dist file anyway). B3. PhyloTree's computeDist and computeObsDist member functions no longer passed dist_file (as they no longer write them). C1. Distance matrix classes no longer have constructors that take file paths. Instead there are separate loadMatrixFromFile and loadMatrix member functions. C2. Added draft of VectorizedUPGMA_Matrix. Missed out a one-liner change in tree/iqtree.cpp in last commit. Allow to specify error probability like +E{0.05} Corrected the mistake that was making BIONJ-R and NJ-R tree construction go wrong: In BoundingMatrix's getRowMinima member function, maxEarlierTotal, which has to be declared an NJFloat, was declared as size_t (so it was getting set to zero). The net effect was that the first entry looked at in the S matrix always "won" (and was treated as though it corresponded to the minimum entry in the Q matrix). Corrected two vector initialisations that assumed the width of the V/FloatVector template parameter was always 4. Since, when you're using T=single, V=Vec8f, the Width of V is 8. In getRowMinima member function, in VectorizedMatrix. I was already using the scalar assignment operator in VectorizedUPGMA_Matrix, which doesn't care about the width of V. Totalling of rows of the D matrix (which happens after clustering, For the row for the new cluster in UPGMA, NJ, and BIONJ), moved to Matrix. This should help performance for UPGMA_Matrix and NJMatrix, a little, on large inputs (It won't help BIONJ, since that was the "donor" class for the code). removeRowAndColumn now does row copying rather than pointer reassignment. removeRowAndColumn also moves rows closer together (grouping them all close to the front of the allocated array) periodically. Both of these changes reduce the amount of memory in use and should help performance on very large inputs. On my laptop, the break-even point is somewhere between 5K and 20K rows. AlignmentPairwise::computeDist (the 4 parameter version), now uses the matrix of converted sequences (when initial distance is zero), and it has to use observed or Jukes-Cantor distance. To get answers consistent with the existing code, it needed to know "the non-const frequency" of each site (which is either the frequency, if The site is not constant, or zero, if the site is constant). The non-const frequencies are recorded in the nonConstSiteFrequencies member of AlignmentSummary, and are available via the getConvertedSequenceNonConstFrequencies() member function of PhyloTree. PhyloTree::computeBioNJ now calls omp_set_nested(1)(if _OPENMP is set, so that Distance matrix calculations (NJ clustering, or whatever) that are happening in Parallel with the write of the distance file, can multithread. Without this call, The distance matrix calculation code goes single threaded (!). Removed commented-out "how long did it take" log-lining code in BioNj::create. Default type used in distance matrix algorithms now float rather than double (because slightly faster). -starttree NJ-R-D requests a double-precision version of the NJ algorithm. In PhyloTree::computeBioNJ, after writing distance file (and perhaps calculating initial tree via a distance matrix algorithm, in parallel), OMP nesting is turned back off again (since it is, and should be, turned off in the rest of IQTree). computeMLDist now copies distance and variance matrices onto the PhyloTree before it asks it to write a distance file. Formerly it was writing out the old distance matrix (if there was one), or crashing (if there wasn't an old distance matrix). This change corrects a bug that I introduced in change 54699c0 on 26th June. Now does an #include of <vector> (needed, since it uses std::vector). The missing #include meant it didn't compile on Windows. Ensuring (int) overflow doesn't mess up allocation of, or indexing into, distance matrices. Replacing int with size_t in: Variables in checkZeroDist, computeMLDist (in phyloanalysis.cpp) Variables in Alignment::printDist (pos). Variables in PhyloTree::correctDist (nsqr, pos, i, k). Variables in PhyloTree::computeDist (num_pairs, pos, seq1). Variables in 4-parameter PhyloTree::computeDist (n, nSquared, i). Variables in PhyloTree::computeObsDist (pos, seq1, seq2). Parameters to Alignment member functions: computeStateFreq, getSeqName, isGapOnlySeq Note, Alignment::checkIndenticalSeq should use hashing. Values returned by Alignment member functions: getNSeq Other places there might have been issues...PhyloTree::initializeAllPartialPars While I was there I made some other Alignment member functions return size_t (e.g. getNPattern), changed the parameter to isGapOnlySeq to size_t. Also: denominator calculated in PhyloTree::computeDist_Experimental now matches that used in the non-experimental distance computation (so the JC distances they report are consistent, to about four decimal places). sumVec (vector summation) template function; manual operator strength reduction, and ensuring that the main loop always has an unroll of 4, regardless of (n % 4). By itself, this results in a significant speed-up in ML distance determination (about 20%). Tidying up warning messages (e.g. (void)ing unused return codes; dropping un-used variables); replacing an old-style boost bind with a lambda in NxsCharactersBlock::HandleTokenState. Removed references to boost::hash_combine and boost::scoped_array (Alignment::adjustHash does the hashing, and a std::vector with a reserve() call is used instead of a scoped array. The reference the scoped_array #include in utils/bionj2.cpp was already "dead", because it no longer used scoped_array at all. Fix compile issue version 2.0.8 Reverted a change that introduced a bug (accidental declaration of a leafNum local variable shadowed a class member) that was breaking parsimony tree calculation, in PhyloTree::create3TaxonTree, accidentally added in a previous commit. 1. Better parallelisation in distance matrix initial tree construction algorithms (Mostly obtained by adding schedule(dynamic) (!) to #pragma parallel for...). 2. The default initial tree algorithm is now NJ-R (not BIONJ). 3. Added BenchmarkingTreeBuilder (which advertises itself as BENCHMARK). It runs each initial tree distance matrix algorithm, with 1, 2, ... n threads. (One after another). I only use this for benchmarking. 4. Added draft "show progress" logic for showing how how initial tree construction is progressing (but it is currently disabled, because I'm not happy with boost::display_progress; it doesn't provide an estimate of the time remaining); Search for the SHOW_DISTANCE_MATRIX_PROGRESS symbol. 5. Added a proper comment header to utils/heapsort.h. 1. Default distance-matrix tree construction algorithm now named "RapidNJ" (via setNameOfDefaultTreeBuilder and getNameOfDefaultTreeBuilder member functions). 2. Builder::constructTreeWith now logs a "Computing [X] tree took [n1] sec (of wall-clock time) [n2] sec (of CPU time)" message unless it has been silenced, Regardless of the logging level. 3. BenchmarkingTreeBuilder::constructTreeInMemory silences those messages, via StartTree::BuilderInterface::beSilent() (they mess up the formatting of the Benchmark output otherwise). New option --robust-phy PROP (jointly with Rob and Barbara) Fix crash with robust-phylo for +I,+R models by disabling EM algorithm New option --robust-median to do median log-likelihood Avoid forcing dump-checkpoint too much (initial tree in standard bootstrap) that caused writing problem Version 2.1.0 COVID edition Extensive changes aimed at speeding up tree loading & parameter optimisation (~2x) & tree search (~2x) by "serialising" their memory access patterns to improve spatial and temporal locality of reference. 1. PhyloTree alignment summaries may now be "borrowed" from another tree that has the same alignment (the relevant member is isSummaryBorrowed; if it is true, this instance doesn't own the summary, it is only "Borrowing" a reference to a summary owned by another tree). 2. PhyloTree member functions copyPhyloTree and copyPhyloTreeMixlen take an extra parameter indicating whether the copy is to "borrow" a copy of the alignment summary of the original (if it has one). This matters a lot for ratefree.cpp and +R free rate models, and modelmixture.cpp! The temporary copies of the phylo tree that are used during parameter Optimization can now re-use the AlignmentSummary of the original; which means they can "linearise" their memory access to sites, when they are Optimising branch lengths (see changes listed below, e.g. #4, #5, #6, #7). 3. PhyloTree::setAlignment does its "name check" a different way (rather than finding each sequence by name by scanning the tree, if asks MTree::getMapOfTaxonNameToNode for a name to leaf node map, and checks the array of sequence names against the map (updating the id on the node for each hit). The new approach is equivalent but is much faster, O(n.ln(n)) rather than O(n^2). This speeds up tree loads markedly (particularly for large trees), but it matters most for free rate parameter optimization (on middling inputs this was a significant factor: about ~10% of parameter optimization time). This can be turned off by changing the FAST_NAME_CHECK symbol. 4. IQTree::optimizeModelParameters now calls prepareToComputeDistances() (So that AlignmentSummary's matrix of converted sequences) will be available (to be borrowed, via calls to PhyloTree::copyPhyloTree (see change # 2 above, and changes #5 through #7 below). Likewise IQTree::doNNISearch (so changes #5, #8 help tree searches too). 5. AlignmentPairwise::computeFunction and AlignmentPairwise::computeFuncDerv( can now make use of AlignmentSummary's "Matrix of converted sequences" (if it is available) via PhyloTree's accessor methods, e.g. PhyloTree::getConvertedSequenceByNumber(). For this to work as expected, it's necessary for callers to ask AlignmentSummary to construct that matrix *including* even sites where there is no variety at all (added the keepBoringSites parameter on the AlignmentSummary constructor for this). 6. RateMeyerDiscrete::computeFunction and RateMeyerDiscrete::computeFuncDerv likewise. And RateMeyerDiscrete::normalizeRates can make use of the "flat" frequency array exposed by PhyloTree::getConvertedSequenceFrequencies() too. 7. PhyloTree::computePartialLikelihoodGenericSIMD (in phylokernelnew.h) makes use of the matrix of converted sequences (if one is available), in about six (!) different places. In terms of actual effect, this is the most important change in this commit, but it needs changes #1, #2, and #4 committed too, if it is to have any effect. This change speeds up both parameter optimisation and tree searching significantly. 8. As well as inv_eigenvectors, there is now an iv_eigenvectors_transposed (Using the transpose makes for some faster multiplications; see change #9 listed below). ModelMarkov::calculateSquareMatrixTranspose is used to calculate the transpose of the inverse eigen vectors. Unpleasant consequence: ModelMarkov::update_eigen_pointers has to take an extra parameter. Keeping this additional member set correctly is the only Thing that forced changes to modelpomomixture.cpp (and .h), modelset.cpp, and modelsubst.h. 9. ModelMarkov::computeTransMatrix and ModelMarkov::computeTransDerv now use (a) calculateExponentOfScalarMultiply and (b) aTimesDiagonalBTimesTransposeOfC to calculate transition matrices (This is quite a bit faster than the Eigen code, since it doesn't bother to construct the diagonal matrix B.asDiagonal()...). (a) and (b) and the supporting functions, calculateHadamardProduct And dotProduct, are (for now) members of ModelMarkov. 10.Minor tweaks to vector processing code in phylokernelnew.h: (a) dotProductVec hand-unrolled treatment of the V array; (b) dotProductPairAdd treated the last item (in A and B) as the special case, when handling an odd number of items. Possibly the treatment of the AD and BD arrays should be hand-unrolled here, too, but I haven't tried that yet. (c) dotProductTriple (checking for odd uses & rather than %) (faster!) 11.The aligned_free free function (from phylotree.h ?!) does the "pointer Null?" check itself, and (because it takes a T*& rather than a T*), can itself set the pointer to nullptr. This means that client code that used to go... if (x) { aligned_free(x); x=NULL; } ... can now be simplified to just... aligned_free(x); 12.Next to it (in phylotree.h), there is now an ensure_aligned_allocated method. That lets you replace code like ... this: if (!eigenvalues) eigenvalues = aligned_alloc<double>(num_states); With: ensure_aligned_allocated(eigenvalues, num_states); which is, I reckon, more readable. 13.In many places where there was code of the form... if (x) { delete x; } I have replaced it with delete x (likewise delete [] x). delete always checks for null (it's required to, that's in the C++ standards), and "Rolling your own check" merely devalues the check that delete will later do! I've made similar "don't bother to check for null" changes in some other files, that I haven't included in this commit (since there aren't any *material* changes to anything in those files). Removed #include to a header file I'd decided wasn't needed in phylotree.cpp, but had left behind in my last commit. Dynamic scheduling for likelihood calculations (mostly). 1. computeMLDistances no longer writes a distance file (it was usually written *again* in computeBioNJ; see change #2). 2. runTreeConstruction can no longer assume that the distance file has been written by computeMLDistances, so (if iqtree->computeBioNJ has not been called, it must write it, even if params.user_file was false, via a call to iqtree->printDistanceFile). 3. PhyloTree now has a num_packets member (which tracks, how many packets to divide work into: it can be the same as num_threads, but is generally more; at present by a factor of 2). Member functions such as getBufferPartialLhSize must allocate per packet rather than per thread. See in particular changes #9, #10 and #11. 4. Removed a little commented-out code from PhyloTree.cpp (And moved for-loop iteration variables that could've been in-loop, but weren't in-loop, in lots of places). (Likewise in phylotreesse.cpp). 5. Removed redundant assignments to nullptr (particularly in PhyloTree::deleteAllPartialLh); these aren't needed now Because aligned_free sets the pointer to nullptr for you. 6. Client code that set IQTree::num_threads directly now does so via setNumThreads (e.g. in phylotesting.cpp) (Also in PhyloTree::optimizePatternRates) (because setNumThreads also sets num_packets). For now, num_packets is set to 2*num_threads (see change #9). 7. Removed dead pointer adjustments in the "any size" case in PhyloTree::computePartialParsimonyFastSIMD. These had been left over from before that member function was vectorised (The pointers are recalculated at the start of the next Iteration of the loop, so adjusting them is a waste of time). (Hopefully the compiler was optimizing the adjustments away). 8. Fully unrolled the size 4 case in productVecMat (In phylokernelnew.h). 9. computeBounds chooses sizes for blocks of work (Based on the number of packets of work as well as the number of threads to be allocated). For now, it is assumed that the number of packets of work is divisible by the number of threads. 10. PhyloTree::computeTraversalInfo calculates buffer sizes Required in terms of num_packets rather than num_threads. 11. #pragma omp parallel for ... and corresponding for loops are now for packets of work not threads. (a) PhyloTree::computeTraversalInfo (b) PhyloTree::computeLikelihoodDervGenericSIMD (*) (Two separate #pragma omp parallel for blocks) (c) PhyloTree::computeLikelihoodBranchGenericSIMD (*) (d) PhyloTree::computeLikelihoodFromBufferGenericSIMD (*) (e) PhyloTree::computeLikelihoodDervMixlenGenericSIMD (*) (f) PhyloTree::computeNonrevLikelihoodDervGenericSIMD (*) (Two separate #pragma omp parallel for blocks) (g) PhyloTree::computeNonrevLikelihoodBranchGenericSIMD (*) (Two separate #pragma omp parallel for blocks) The ones marked with (*) now use reductions (aimed at double) where possible, rather than #omp critical section. I've got rid of the private(pin,i,c) stuff by declaring Those variables local to the loops that use them. (This means doing horizontal_add per-packet rather than after all the packets are processed). They all use dynamic (rather than static) scheduling. 1. More logging of how long Alignment-reading steps are taking, when loading alignments from files (in verbose mode) 2. Parallelized the main loop in Alignment::detectSequenceType 3. Parallelized Alignment::countStatesForSites (the per-thread bit is in Alignment::countStatesForSites). Removing redundant "is it null before I delete it and set it to null" checks (no need! delete [] does that check for you). Added HOW_LONG macro, used for log-lining how long individual steps take. ModelFactory::optimizeParameters logs more detail of how long it is taking, if iqtree is running in verbose mode. Indentation and curly-brace tidy up (no material changes). Tidying up indentation; moving some for-loop iterator variable declarations into the for-loops. Ignore visual studio files (such as precompiled headers) Added pll/systypes.h (will be used to work around Windows portability issues). Added explicit initialisation of members that weren't initialized in SymTestResult's constructor (in alignment.h). Added some curly braces in a few places in alignment.cpp (not a material change). Added a missing #include, qualified some references to iostream, added a static_cast, in ncl/nxcharactersblock.h. Removed NxsString::IsInVector definition from nxsstring.h (I'll be moving it to nxsstring.cpp, when visual studio and git let me). (Right now, I'm fighting with Visual Studio;s broken git integration) Bug fix: AlignmentSummary's constructor should be using resize (on its sites vector), not reserve. Index out-of-bounds error picked up in Visual Studio. Updating CMakeLists for 64-bit VS2019 builds (for now, just targeting windows OS), using the Clang compiler. 1. WIN32 symbol is now defined, when the C/C++ compiler is invoked, only in 32-bit windows builds (WIN64 is set instead in 64-bit builds) 2. CLANG_UNDER_VS is set (and a symbol with the same name is defined when the C/C++ compiler is invoked) (when CMake is being invoked from Visual Studio). When it is set (a) _UWIN is defined and /GX is passed (when you're calling the VS-friendly wrapper for Clang, /GX is equivalent to passing -fexceptions to the usual version of clang). (b) /MP (which requests multi-processor compilation) is *not* passed (c) /O2 is passed (in release builds) (Seems odd. You'd have thought, /O3, surely?) (d) In 64-bit builds, -m64 is supplied as a parameter to the compiler (e) If MPI package isn't found (it doesn't seem to be) MPI_DIR is as passed to the compiler (you'll need to download MS-MPI and point MPI_DIR at it) 3. Detection of 32-bit versus 64-bit binary is done sooner than it used to be (to facilitate 2(d) above). Adjusting includes that cause problems for Windows/VS builds (stuff listed here indicates how VS is different): 1. Disabling #include <getopt.h> (e.g. in booster/booster.c) 2. Disabling #include <libgen.h> (e…
optimisation (~2x) & tree search (~2x) by "serialising" their memory access patterns to improve spatial and temporal locality of reference. 1. PhyloTree alignment summaries may now be "borrowed" from another tree that has the same alignment (the relevant member is isSummaryBorrowed; if it is true, this instance doesn't own the summary, it is only "Borrowing" a reference to a summary owned by another tree). 2. PhyloTree member functions copyPhyloTree and copyPhyloTreeMixlen take an extra parameter indicating whether the copy is to "borrow" a copy of the alignment summary of the original (if it has one). This matters a lot for ratefree.cpp and +R free rate models, and modelmixture.cpp! The temporary copies of the phylo tree that are used during parameter Optimization can now re-use the AlignmentSummary of the original; which means they can "linearise" their memory access to sites, when they are Optimising branch lengths (see changes listed below, e.g. iqtree#4, iqtree#5, iqtree#6, iqtree#7). 3. PhyloTree::setAlignment does its "name check" a different way (rather than finding each sequence by name by scanning the tree, if asks MTree::getMapOfTaxonNameToNode for a name to leaf node map, and checks the array of sequence names against the map (updating the id on the node for each hit). The new approach is equivalent but is much faster, O(n.ln(n)) rather than O(n^2). This speeds up tree loads markedly (particularly for large trees), but it matters most for free rate parameter optimization (on middling inputs this was a significant factor: about ~10% of parameter optimization time). This can be turned off by changing the FAST_NAME_CHECK symbol. 4. IQTree::optimizeModelParameters now calls prepareToComputeDistances() (So that AlignmentSummary's matrix of converted sequences) will be available (to be borrowed, via calls to PhyloTree::copyPhyloTree (see change # 2 above, and changes iqtree#5 through iqtree#7 below). Likewise IQTree::doNNISearch (so changes iqtree#5, iqtree#8 help tree searches too). 5. AlignmentPairwise::computeFunction and AlignmentPairwise::computeFuncDerv( can now make use of AlignmentSummary's "Matrix of converted sequences" (if it is available) via PhyloTree's accessor methods, e.g. PhyloTree::getConvertedSequenceByNumber(). For this to work as expected, it's necessary for callers to ask AlignmentSummary to construct that matrix *including* even sites where there is no variety at all (added the keepBoringSites parameter on the AlignmentSummary constructor for this). 6. RateMeyerDiscrete::computeFunction and RateMeyerDiscrete::computeFuncDerv likewise. And RateMeyerDiscrete::normalizeRates can make use of the "flat" frequency array exposed by PhyloTree::getConvertedSequenceFrequencies() too. 7. PhyloTree::computePartialLikelihoodGenericSIMD (in phylokernelnew.h) makes use of the matrix of converted sequences (if one is available), in about six (!) different places. In terms of actual effect, this is the most important change in this commit, but it needs changes #1, #2, and iqtree#4 committed too, if it is to have any effect. This change speeds up both parameter optimisation and tree searching significantly. 8. As well as inv_eigenvectors, there is now an iv_eigenvectors_transposed (Using the transpose makes for some faster multiplications; see change iqtree#9 listed below). ModelMarkov::calculateSquareMatrixTranspose is used to calculate the transpose of the inverse eigen vectors. Unpleasant consequence: ModelMarkov::update_eigen_pointers has to take an extra parameter. Keeping this additional member set correctly is the only Thing that forced changes to modelpomomixture.cpp (and .h), modelset.cpp, and modelsubst.h. 9. ModelMarkov::computeTransMatrix and ModelMarkov::computeTransDerv now use (a) calculateExponentOfScalarMultiply and (b) aTimesDiagonalBTimesTransposeOfC to calculate transition matrices (This is quite a bit faster than the Eigen code, since it doesn't bother to construct the diagonal matrix B.asDiagonal()...). (a) and (b) and the supporting functions, calculateHadamardProduct And dotProduct, are (for now) members of ModelMarkov. 10.Minor tweaks to vector processing code in phylokernelnew.h: (a) dotProductVec hand-unrolled treatment of the V array; (b) dotProductPairAdd treated the last item (in A and B) as the special case, when handling an odd number of items. Possibly the treatment of the AD and BD arrays should be hand-unrolled here, too, but I haven't tried that yet. (c) dotProductTriple (checking for odd uses & rather than %) (faster!) 11.The aligned_free free function (from phylotree.h ?!) does the "pointer Null?" check itself, and (because it takes a T*& rather than a T*), can itself set the pointer to nullptr. This means that client code that used to go... if (x) { aligned_free(x); x=NULL; } ... can now be simplified to just... aligned_free(x); 12.Next to it (in phylotree.h), there is now an ensure_aligned_allocated method. That lets you replace code like ... this: if (!eigenvalues) eigenvalues = aligned_alloc<double>(num_states); With: ensure_aligned_allocated(eigenvalues, num_states); which is, I reckon, more readable. 13.In many places where there was code of the form... if (x) { delete x; } I have replaced it with delete x (likewise delete [] x). delete always checks for null (it's required to, that's in the C++ standards), and "Rolling your own check" merely devalues the check that delete will later do! I've made similar "don't bother to check for null" changes in some other files, that I haven't included in this commit (since there aren't any *material* changes to anything in those files).
1. computeMLDistances no longer writes a distance file (it was usually written *again* in computeBioNJ; see change #2). 2. runTreeConstruction can no longer assume that the distance file has been written by computeMLDistances, so (if iqtree->computeBioNJ has not been called, it must write it, even if params.user_file was false, via a call to iqtree->printDistanceFile). 3. PhyloTree now has a num_packets member (which tracks, how many packets to divide work into: it can be the same as num_threads, but is generally more; at present by a factor of 2). Member functions such as getBufferPartialLhSize must allocate per packet rather than per thread. See in particular changes iqtree#9, iqtree#10 and iqtree#11. 4. Removed a little commented-out code from PhyloTree.cpp (And moved for-loop iteration variables that could've been in-loop, but weren't in-loop, in lots of places). (Likewise in phylotreesse.cpp). 5. Removed redundant assignments to nullptr (particularly in PhyloTree::deleteAllPartialLh); these aren't needed now Because aligned_free sets the pointer to nullptr for you. 6. Client code that set IQTree::num_threads directly now does so via setNumThreads (e.g. in phylotesting.cpp) (Also in PhyloTree::optimizePatternRates) (because setNumThreads also sets num_packets). For now, num_packets is set to 2*num_threads (see change iqtree#9). 7. Removed dead pointer adjustments in the "any size" case in PhyloTree::computePartialParsimonyFastSIMD. These had been left over from before that member function was vectorised (The pointers are recalculated at the start of the next Iteration of the loop, so adjusting them is a waste of time). (Hopefully the compiler was optimizing the adjustments away). 8. Fully unrolled the size 4 case in productVecMat (In phylokernelnew.h). 9. computeBounds chooses sizes for blocks of work (Based on the number of packets of work as well as the number of threads to be allocated). For now, it is assumed that the number of packets of work is divisible by the number of threads. 10. PhyloTree::computeTraversalInfo calculates buffer sizes Required in terms of num_packets rather than num_threads. 11. #pragma omp parallel for ... and corresponding for loops are now for packets of work not threads. (a) PhyloTree::computeTraversalInfo (b) PhyloTree::computeLikelihoodDervGenericSIMD (*) (Two separate #pragma omp parallel for blocks) (c) PhyloTree::computeLikelihoodBranchGenericSIMD (*) (d) PhyloTree::computeLikelihoodFromBufferGenericSIMD (*) (e) PhyloTree::computeLikelihoodDervMixlenGenericSIMD (*) (f) PhyloTree::computeNonrevLikelihoodDervGenericSIMD (*) (Two separate #pragma omp parallel for blocks) (g) PhyloTree::computeNonrevLikelihoodBranchGenericSIMD (*) (Two separate #pragma omp parallel for blocks) The ones marked with (*) now use reductions (aimed at double) where possible, rather than #omp critical section. I've got rid of the private(pin,i,c) stuff by declaring Those variables local to the loops that use them. (This means doing horizontal_add per-packet rather than after all the packets are processed). They all use dynamic (rather than static) scheduling.
was necessary (see #2 through iqtree#8 and particularly iqtree#5 below), and also drafted some additional "progress-reporting" (see iqtree#9 through iqtree#11): 1. If -mlnj-only is found on the command-line, Params::compute_ml_tree_only will be set to true (in parseArg(), in utils/tools.cpp). 2. initializeParams doesn't call computeInitialTree if compute_ml_tree_only is set to true. 3. You can't set the root of a tree (if you don't yet have one), a bit later in the same function (and also in IQTree::initSettings). 4. Added PhyloTree::ensureNumberOfThreadsIsSet (and updated repetitive code that was doing what it does, in several other places). This forced some updates in other files, such as main/phylotesting.cpp. 5. Added PhyloTree::ensureModelParametersAreSet (as the same steps need to be carried out somewhat later if there isn't an initial tree before ML distances are calculated). It returns a tree string. 6. In runTreeConstruction, when compute_ml_tree_only is set, negative branches are resolved, and iqtree#4 and iqtree#5 are called only AFTER the tree has been constructed. 7. In IQTree::initCandidateTreeSet the tree mightn't be a parsimony tree (I think if you've combined -nt AUTO and --mlnj-only) as such, but there will be *a* tree. The list of cases wasn't exhaustive any more. 8. Added a distanceFileWritten member variable and a getDistanceFileWritten Member function to PhyloTree. 9. (This and the following changes are progress reporting changes). Added member functions for progress reporting to PhyloTree: (a) initProgress (pushes where you are on a stack, and starts reporting progress, if there's now one level of progress reporting on the stack) (b) trackProgress (bumps up progress if progress stack depth is: 1) (c) hideProgress (called before you write log messages to cut) (d) showProgress (called again after) (e) doneProgress (pops, and stops reporting progress, if the last level of progress reporting was just popped) The supporting member variables are progressStackDepth and progress. 9. IQTree::optimizeNNI uses the functions added in change iqtree#9 to report Progress (problem here is that MAXSTEPS is a rather "high" guess (For n sequences it is ~2n, when the best guess for how many iterations There will be, with parallel NNIs, is on the order of ~p where p is the worst-case "tip-to-tip" path length of the tree - probably a lot less. 10.PhyloTree::testNumThreads also uses the functions added in change#9 to Report how many threads it has tried (though, for now, it badly over-reports how long it thinks it will take) (because it thinks it will do max_procs iterations and each will take as long as the last, but, Really, it'll do max_procs/2, or so, and they go faster and faster as there are more threads in use in later steps - one more each step). 11.PhyloTree::optimizeAllBranches reports progress (via the functions added in change#9). Normally it reports progress during parameter optimisation (because I haven't written "higher-level" progress reporting for that yet). There are some potential issues though: 1. The special-case code for dealing with "+I+G" rates doesn't yet have a counterpart when compute_ml_tree_only is set (in runTreeConstruction). 2. Likewise, the code for when (params.lmap_num_quartets >= 0) (No counterpart when compute_ml_tree_only is set, yet) (this too is in runTreeConstruction). (I haven't figured out how to test the "counterpart" versions of those yet, which is why I haven't written them) 3. If you pass -nt AUTO I'm not sure how many threads the NJ (or whatever) step will use (I think it's all of them), and the ML distance calculations also "use all the threads" (because the thread count's not set when that code runs either). Both parallelise... well... but I'm not so sure it's a good idea that it hogs all the CPU cores like that.
optimisation (~2x) & tree search (~2x) by "serialising" their memory access patterns to improve spatial and temporal locality of reference. 1. PhyloTree alignment summaries may now be "borrowed" from another tree that has the same alignment (the relevant member is isSummaryBorrowed; if it is true, this instance doesn't own the summary, it is only "Borrowing" a reference to a summary owned by another tree). 2. PhyloTree member functions copyPhyloTree and copyPhyloTreeMixlen take an extra parameter indicating whether the copy is to "borrow" a copy of the alignment summary of the original (if it has one). This matters a lot for ratefree.cpp and +R free rate models, and modelmixture.cpp! The temporary copies of the phylo tree that are used during parameter Optimization can now re-use the AlignmentSummary of the original; which means they can "linearise" their memory access to sites, when they are Optimising branch lengths (see changes listed below, e.g. iqtree#4, iqtree#5, iqtree#6, iqtree#7). 3. PhyloTree::setAlignment does its "name check" a different way (rather than finding each sequence by name by scanning the tree, if asks MTree::getMapOfTaxonNameToNode for a name to leaf node map, and checks the array of sequence names against the map (updating the id on the node for each hit). The new approach is equivalent but is much faster, O(n.ln(n)) rather than O(n^2). This speeds up tree loads markedly (particularly for large trees), but it matters most for free rate parameter optimization (on middling inputs this was a significant factor: about ~10% of parameter optimization time). This can be turned off by changing the FAST_NAME_CHECK symbol. 4. IQTree::optimizeModelParameters now calls prepareToComputeDistances() (So that AlignmentSummary's matrix of converted sequences) will be available (to be borrowed, via calls to PhyloTree::copyPhyloTree (see change # 2 above, and changes iqtree#5 through iqtree#7 below). Likewise IQTree::doNNISearch (so changes iqtree#5, iqtree#8 help tree searches too). 5. AlignmentPairwise::computeFunction and AlignmentPairwise::computeFuncDerv( can now make use of AlignmentSummary's "Matrix of converted sequences" (if it is available) via PhyloTree's accessor methods, e.g. PhyloTree::getConvertedSequenceByNumber(). For this to work as expected, it's necessary for callers to ask AlignmentSummary to construct that matrix *including* even sites where there is no variety at all (added the keepBoringSites parameter on the AlignmentSummary constructor for this). 6. RateMeyerDiscrete::computeFunction and RateMeyerDiscrete::computeFuncDerv likewise. And RateMeyerDiscrete::normalizeRates can make use of the "flat" frequency array exposed by PhyloTree::getConvertedSequenceFrequencies() too. 7. PhyloTree::computePartialLikelihoodGenericSIMD (in phylokernelnew.h) makes use of the matrix of converted sequences (if one is available), in about six (!) different places. In terms of actual effect, this is the most important change in this commit, but it needs changes #1, #2, and iqtree#4 committed too, if it is to have any effect. This change speeds up both parameter optimisation and tree searching significantly. 8. As well as inv_eigenvectors, there is now an iv_eigenvectors_transposed (Using the transpose makes for some faster multiplications; see change iqtree#9 listed below). ModelMarkov::calculateSquareMatrixTranspose is used to calculate the transpose of the inverse eigen vectors. Unpleasant consequence: ModelMarkov::update_eigen_pointers has to take an extra parameter. Keeping this additional member set correctly is the only Thing that forced changes to modelpomomixture.cpp (and .h), modelset.cpp, and modelsubst.h. 9. ModelMarkov::computeTransMatrix and ModelMarkov::computeTransDerv now use (a) calculateExponentOfScalarMultiply and (b) aTimesDiagonalBTimesTransposeOfC to calculate transition matrices (This is quite a bit faster than the Eigen code, since it doesn't bother to construct the diagonal matrix B.asDiagonal()...). (a) and (b) and the supporting functions, calculateHadamardProduct And dotProduct, are (for now) members of ModelMarkov. 10.Minor tweaks to vector processing code in phylokernelnew.h: (a) dotProductVec hand-unrolled treatment of the V array; (b) dotProductPairAdd treated the last item (in A and B) as the special case, when handling an odd number of items. Possibly the treatment of the AD and BD arrays should be hand-unrolled here, too, but I haven't tried that yet. (c) dotProductTriple (checking for odd uses & rather than %) (faster!) 11.The aligned_free free function (from phylotree.h ?!) does the "pointer Null?" check itself, and (because it takes a T*& rather than a T*), can itself set the pointer to nullptr. This means that client code that used to go... if (x) { aligned_free(x); x=NULL; } ... can now be simplified to just... aligned_free(x); 12.Next to it (in phylotree.h), there is now an ensure_aligned_allocated method. That lets you replace code like ... this: if (!eigenvalues) eigenvalues = aligned_alloc<double>(num_states); With: ensure_aligned_allocated(eigenvalues, num_states); which is, I reckon, more readable. 13.In many places where there was code of the form... if (x) { delete x; } I have replaced it with delete x (likewise delete [] x). delete always checks for null (it's required to, that's in the C++ standards), and "Rolling your own check" merely devalues the check that delete will later do! I've made similar "don't bother to check for null" changes in some other files, that I haven't included in this commit (since there aren't any *material* changes to anything in those files).
1. computeMLDistances no longer writes a distance file (it was usually written *again* in computeBioNJ; see change #2). 2. runTreeConstruction can no longer assume that the distance file has been written by computeMLDistances, so (if iqtree->computeBioNJ has not been called, it must write it, even if params.user_file was false, via a call to iqtree->printDistanceFile). 3. PhyloTree now has a num_packets member (which tracks, how many packets to divide work into: it can be the same as num_threads, but is generally more; at present by a factor of 2). Member functions such as getBufferPartialLhSize must allocate per packet rather than per thread. See in particular changes iqtree#9, iqtree#10 and iqtree#11. 4. Removed a little commented-out code from PhyloTree.cpp (And moved for-loop iteration variables that could've been in-loop, but weren't in-loop, in lots of places). (Likewise in phylotreesse.cpp). 5. Removed redundant assignments to nullptr (particularly in PhyloTree::deleteAllPartialLh); these aren't needed now Because aligned_free sets the pointer to nullptr for you. 6. Client code that set IQTree::num_threads directly now does so via setNumThreads (e.g. in phylotesting.cpp) (Also in PhyloTree::optimizePatternRates) (because setNumThreads also sets num_packets). For now, num_packets is set to 2*num_threads (see change iqtree#9). 7. Removed dead pointer adjustments in the "any size" case in PhyloTree::computePartialParsimonyFastSIMD. These had been left over from before that member function was vectorised (The pointers are recalculated at the start of the next Iteration of the loop, so adjusting them is a waste of time). (Hopefully the compiler was optimizing the adjustments away). 8. Fully unrolled the size 4 case in productVecMat (In phylokernelnew.h). 9. computeBounds chooses sizes for blocks of work (Based on the number of packets of work as well as the number of threads to be allocated). For now, it is assumed that the number of packets of work is divisible by the number of threads. 10. PhyloTree::computeTraversalInfo calculates buffer sizes Required in terms of num_packets rather than num_threads. 11. #pragma omp parallel for ... and corresponding for loops are now for packets of work not threads. (a) PhyloTree::computeTraversalInfo (b) PhyloTree::computeLikelihoodDervGenericSIMD (*) (Two separate #pragma omp parallel for blocks) (c) PhyloTree::computeLikelihoodBranchGenericSIMD (*) (d) PhyloTree::computeLikelihoodFromBufferGenericSIMD (*) (e) PhyloTree::computeLikelihoodDervMixlenGenericSIMD (*) (f) PhyloTree::computeNonrevLikelihoodDervGenericSIMD (*) (Two separate #pragma omp parallel for blocks) (g) PhyloTree::computeNonrevLikelihoodBranchGenericSIMD (*) (Two separate #pragma omp parallel for blocks) The ones marked with (*) now use reductions (aimed at double) where possible, rather than #omp critical section. I've got rid of the private(pin,i,c) stuff by declaring Those variables local to the loops that use them. (This means doing horizontal_add per-packet rather than after all the packets are processed). They all use dynamic (rather than static) scheduling.
was necessary (see #2 through iqtree#8 and particularly iqtree#5 below), and also drafted some additional "progress-reporting" (see iqtree#9 through iqtree#11): 1. If -mlnj-only is found on the command-line, Params::compute_ml_tree_only will be set to true (in parseArg(), in utils/tools.cpp). 2. initializeParams doesn't call computeInitialTree if compute_ml_tree_only is set to true. 3. You can't set the root of a tree (if you don't yet have one), a bit later in the same function (and also in IQTree::initSettings). 4. Added PhyloTree::ensureNumberOfThreadsIsSet (and updated repetitive code that was doing what it does, in several other places). This forced some updates in other files, such as main/phylotesting.cpp. 5. Added PhyloTree::ensureModelParametersAreSet (as the same steps need to be carried out somewhat later if there isn't an initial tree before ML distances are calculated). It returns a tree string. 6. In runTreeConstruction, when compute_ml_tree_only is set, negative branches are resolved, and iqtree#4 and iqtree#5 are called only AFTER the tree has been constructed. 7. In IQTree::initCandidateTreeSet the tree mightn't be a parsimony tree (I think if you've combined -nt AUTO and --mlnj-only) as such, but there will be *a* tree. The list of cases wasn't exhaustive any more. 8. Added a distanceFileWritten member variable and a getDistanceFileWritten Member function to PhyloTree. 9. (This and the following changes are progress reporting changes). Added member functions for progress reporting to PhyloTree: (a) initProgress (pushes where you are on a stack, and starts reporting progress, if there's now one level of progress reporting on the stack) (b) trackProgress (bumps up progress if progress stack depth is: 1) (c) hideProgress (called before you write log messages to cut) (d) showProgress (called again after) (e) doneProgress (pops, and stops reporting progress, if the last level of progress reporting was just popped) The supporting member variables are progressStackDepth and progress. 9. IQTree::optimizeNNI uses the functions added in change iqtree#9 to report Progress (problem here is that MAXSTEPS is a rather "high" guess (For n sequences it is ~2n, when the best guess for how many iterations There will be, with parallel NNIs, is on the order of ~p where p is the worst-case "tip-to-tip" path length of the tree - probably a lot less. 10.PhyloTree::testNumThreads also uses the functions added in change#9 to Report how many threads it has tried (though, for now, it badly over-reports how long it thinks it will take) (because it thinks it will do max_procs iterations and each will take as long as the last, but, Really, it'll do max_procs/2, or so, and they go faster and faster as there are more threads in use in later steps - one more each step). 11.PhyloTree::optimizeAllBranches reports progress (via the functions added in change#9). Normally it reports progress during parameter optimisation (because I haven't written "higher-level" progress reporting for that yet). There are some potential issues though: 1. The special-case code for dealing with "+I+G" rates doesn't yet have a counterpart when compute_ml_tree_only is set (in runTreeConstruction). 2. Likewise, the code for when (params.lmap_num_quartets >= 0) (No counterpart when compute_ml_tree_only is set, yet) (this too is in runTreeConstruction). (I haven't figured out how to test the "counterpart" versions of those yet, which is why I haven't written them) 3. If you pass -nt AUTO I'm not sure how many threads the NJ (or whatever) step will use (I think it's all of them), and the ML distance calculations also "use all the threads" (because the thread count's not set when that code runs either). Both parallelise... well... but I'm not so sure it's a good idea that it hogs all the CPU cores like that.
now using name decenttree.cpp
now using name decenttree.cpp
No description provided.