forked from CBADS-SPI/MUMPS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChangeLog
855 lines (791 loc) · 40.8 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
-------------
= ChangeLog =
-------------
Changes from 5.6.2 to 5.7.0
* New feature: Evolution of the distributed RHS: exploit sparsity (empty rows)
* New feature: more efficient multithreading (see ICNTL(48))
* New feature: Improved null space detection with rank revealing (ICNTL(56))
* New statistics about pivots and number of swaps
* Reduced time spent in BLR clustering
* OOC: add support for files > 2GB to avoid opening too many files
* Increased various string max sizes: OOC_PREFIX(255), SAVE_PREFIX(255),
WRITE_PROBLEM(1023), OOC_TMPDIR(1023), SAVE_DIR(1023),
* -DMUMPS_SCOTCHIMPORTOMPTHREADS: uses MUMPS OMP threads in SCOTCH
* Improved size of BLR groups with analysis by blocks
* Allow for larger N (avoid need for 2*N to fit in a 32-bit integers)
* Improved performance on small matrices (avoid costly string
comparisons when computing RINFO(7,8) and RINFOG(17,18))
* -DNO_SAVE_RESTORE: suppress SAVE_RESTORE, RINFO(7,8), RINFOG(17,18)
* Avoid calling MPI_IPROBE during solve in case of a single MPI process
* MPI_ALLREDUCE workaround in case it fails for large buffers
* Raise error -55 in case Nloc_RHS>0 on non-working host
* Raise new error -88 in case of error during SCOTCH ordering
* Fixed bound-check error in copies for 64 bit integer orderings and empty graph
* Raise error -69 in case of installation with incorrect integer sizes
* Compilation with -DNOSCALAPACK: suppress dependency on NUMROC
* Fix for METIS > 5.1.0: METIS options indices no longer hardcoded
* Fixed a bug in solve when restarting (JOB=8) with a new process
* Fixed error with exploit sparsity during fwd phase on
reducible matrices and when Scalapack is used
* Makefiles: introduced SHARED_OPT=-shared to allow defining different option
Changes from 5.6.1 to 5.6.2
* Fixed time for symbolic factorization
* Fixed parallel analysis when PAR=0, 1 MPI per node, #MPI not a power of 2
* -DAVOID_MPI_IN_PLACE avoids MPI_IN_PLACE usage (in case not supported by MPI)
* Fixed a bug in case of parallel analysis+analysis by blocks+scalapack off
* Added missing $(PLAT) for libpord in makefile
* Fixed message printing with error -17
Changes from 5.6.0 to 5.6.1
* Fixed compatibility of parallel analysis with analysis by blocks/out-of-range
* Minor fix of INFO(2) value in a case of INFO(1)=-7 with parallel analysis
* WRITE_PROBLEM: avoid leading space when printing "%%MatrixMarket matrix [..]"
* Fix incorrect error code -74 instead of -76 (save-restore)
* Avoid a remaining hard-coded Fortran unit (save-restore)
* Avoid -Wundef warning in case __STDC_VERSION__ is not defined
* Avoid calling MPI_IPROBE during factorization in case of a single MPI process
(performance improvement on matrices with small fronts)
* During analysis, write time for symbolic factorization
* Update GEMMT link in INSTALL file
Changes from 5.5.1 to 5.6.0
* Analysis by blocks and out-of-range entries compatible with parallel analysis
* Compact workarray S before solution phase (ICNTL(49)=1,2)
* JOB=-4 frees data from factorization and keeps results from analysis
* NEC vector engine version: tuned block low rank and GEMMT usage
* Reduced memory for symbolic datastructures (order-N arrays, arrowheads)
* Use new symbolic factorization (column counts) by default (+allow forest
in case of Schur)
* Improved amalgamation algorithm to reduce the number tiny nodes
* Discard factors option is now compatible with BLR
* Discard L factors option in case of LU factorization available for in-core
* Count null negative pivots (INFOG(50))
* Compilation with -DBLR_MT is no longer needed (see also -DBLR_NOOPENMP)
* Improve performance of A-1 entries computation when #MPI > 1
* Fixed risk of hanging in case of OOC error (e.g. disk full)
* Free internal data earlier (e.g., factors in case of failed factorization, etc.)
* Avoid setenv/unsetenv on MinGW environments (Scotch versions >= 7)
* Fix error during ordering when processing matrices of order 1
* Out-Of-Core, values of ICNTL(22) different from 1 are treated as zero (in-core)
* BLKPTR ignored when ICNTL(15) < 0 (INFO(1:2)=-57 4 no longer occurs)
* BLR OpenMP fac_lr.F fix for possible non-increasing scheduling of loop indices
* Fix: id%LRGROUPS was not nullified in CMUMPS_FREE_ONENTRY_ANA_DRIVER
* Fixed parameter error 1202 in PDGETRS after restoring a factorization
* Workaround MPI_SSEND intel MPI 2021.6 issue
* NEC: Fix MUMPS_WRAP_GINP94 offloading to host
* Shared libraries: minor update in Makefiles
* Fixed missing printings of preprocessing constants in mumps_print_defined.F
* -DNOSCALAPACK: compilation without BLACS/ScaLAPACK (reduced performance)
Changes from 5.5.0 to 5.5.1
* 64-bit default integer version: avoid MPI buffer size > 2^31
(could lead to negative counts in MPI_Recv with Intel ilp64 MPI)
* Use ICNTL(3) unit instead of '*' to print parallel ordering time
* Output symbolic factorization choice (ICNTL(58)) on ICNTL(3) unit
* SCOTCH 7.x on Windows: use _putenv instead of setenv
(unsymmetric case, #MPI >1)
* Fixed parallel analysis in case of 64-bit integers in parmetis and -i8 MUMPS
* Limit recursivity depth in mumps_static_mapping.F (limit stack
usage on trees with small nodes at the top and large nodes below)
* Limit subroutine names to 32 characters
* -DWORKAROUNDILP64MPICUSTOMREDUCE introduced (platformMPI and 64-bit default integer)
Changes from 5.4.1 to 5.5.0
* New feature: analysis by blocks with compressed graph (ICNTL(15))
* New feature: symbolic factorization using column counts (ICNTL(58))
* Matrix scaling compatible with Schur complement feature
* Automatic setting of CNTL(3) and CNTL(5) compatible with Schur
complement feature
* Fixed possible error in memory (INFOG(3)) and FLOPS (RINFOG(1))
estimations with Schur feature
* Memory allowed (ICNTL(23)) is per MPI process and compatible with WK_USER
* More compact storage of L factors for symmetric matrices
* Exploit multithreading in SCOTCH versions >= 7.0
* Support for NEC Aurora vector processors
* Possibility to build shared instead of static libraries
* Adjustment of BLR cluster size
* Shorter time for matrix redistribution
* Avoid a possible case of overflow in computation of MAXIS
* Fixed memory estimate of PTRAIW/PTRARW arrays
* -DDPRINT_BACKTRACE_ON_ABORT added in case of call to MUMPS_ABORT()
* More usage of dynamic memory to avoid -9 errors
* Fixed a possible "Internal error 3 in ZMUMPS_BLR_RETRIEVE_PANEL_LORU"
* Change construction of mumps_int_def.h to allow for cross compilation
* Fixed a possible error in deficiency detection on unsymmetric matrices
* Duplicated few loops during solve (OMP / non-OMP) for gfortran performance
* Fixed compilation issue with -DUPPER and -DMUMPS_WIN32 (5.4.1 regression)
* Find free Fortan units instead of imposing a choice (WRITE_PROBLEM,
save/restore)
* Avoid freeing IRN/JCN when not allocated by MUMPS
* Fixed a potential memory leak in dynamic memory management
* -DALLOC_FROM_C introduced to fix dynamic memory management
with Cray compilers
Changes from 5.4.0 to 5.4.1
* Added feature to dump a matrix/rhs in binary form (see id%WRITE_PROBLEM)
* Fixed error during constrained ordering (ICNTL(12)=3) leading to a segfault
* Fixed type 2 assembly by pieces in case of huge blocks of delayed pivots
* Avoid repeated small allocations in MUMPS_BUILD_SORT_INDEX
* Fixed automatic choice for ICNTL(6) when values are not provided at analysis
* Avoid an access to an uninitialized value (FLOP_FRFRONTS, MPI only, LDLT)
* Fixed error printing in case of error -53
* Fixed an erroneous -9 error in case of large Schur + discard factors
* -DWORKAROUNDINTELILP64OPENMPLIMITATION avoids -i8 compilation warnings
* Limited repeated amalgamations of large child with small parent
Changes from 5.3.5 to 5.4.0
* Modified default threshold for null pivot detection (CNTL(3))
* Improved performance of Schur complement computations
* Reduced memory of centralized analysis with distributed matrix
* Improved BLR clustering time during analysis
* Improved BLR performance when CNTL(1)=0.0 and ICNTL(36)=0
* Improved OOC performance in case CNTL(1)=0.0 or SYM=1
* Improved multitreaded performance on matrices with many 2x2 pivots
* Fixed a problem with parallel compilation (in case of: make -j all)
* Fixed some timings during solve that were printed as 0 when PAR=0
* Avoid possible integer overflow on INFOG(10)
* Fixed type 1 assembly by pieces in case static CB moves to dynamic storage
* Use with ITAC: avoid errors (e.g. in DMUMPS_UPPER_PREDICT...) due to
MPI returning nonzero error codes when traces are activated
* Add a JOB=-200 call (no MPI communications, clean local OOC files)
* Fixed "Start processing the root ..." printing in case of Schur
* Fixed SYM_PERM in case of Schur on reducible matrices
Changes from 5.3.4 to 5.3.5
* Fixed 2x2 pivots bug from 5.3.4 release in MPI LDLT factorization
* Fixed ICNTL(8)=-2 option during analysis (code and documentation)
* Include basic fix for make -j
Changes from 5.3.3 to 5.3.4
* Fixed a rare bug (segfault) related to dynamic storage management
on numerically difficult matrices
* Fixed a rare deadlock in BLR for symmetric matrices
* Fixed an uninitialized variable (which could lead to incorrect -19 error)
* Minor fix in userguide (CNTL(1) vs. ICNTL(1) in ICNTL(36) description)
* Fixed a possible runtime issue during solve, related to "TO_PROCESS" array
Changes from 5.3.1 to 5.3.3
* Assume ilp64 MPI interface only applies to Fortran in c_example.c
* Note on gfortran-10 compilation added
* Avoid intent on pointers (F2003-only)
* More robust multithreading for matrix reformatting (arrowheads)
* Fixed ICNTL(31) interpretation in case of repeated analysis
* Fixed multiple mpif.h inclusion (distributed rhs, ifort+openmpi)
* Fixed computation of effectively used memory statistics
* Minor fix in userguide
* Suppressed a !$OMP CRITICAL from solve phase (introduced in 5.3.0)
Changes from 5.3.0 to 5.3.1
* Improved multithreaded performance of BLR backward solve
* Fixed return code in build_mumps_int_def.c + openmp compilation (pgi)
* Forbid a loop vectorization in [sdcz]sol_c.F (segfault with ifort)
Changes from 5.2.1 to 5.3.0
* New feature: distributed right-hand sides
* Improved time for arrowheads construction (single MPI case, mainly)
* C interface: ability to know if MUMPS_INT is 64-bit from include file
* Improved BLR performance when CNTL(1)=0.0 and ICNTL(36)=1
* Fixed INFO(34),INFO(35),INFO(37),INFO(38) on processes with rank > 0
* More portable MPI_IS_IN_PLACE feature in libseq
* Fixed determinant computation when Cholesky ScaLapack is used
* Information on advancement (flops done) on each MPI process
* Allow rhs_sparse and irhs_sparse to be unassociated if nz_rhs=0
* Fixed INFO(30) and INFO(31) computation on MPI processes with rank > 0
* OMP collapsed loops: avoid FIRSTPRIVATE on internal loop bound (for pgi)
* Fix for compilers not freeing local allocatable arrays (64-bit metis)
* Fixed RINFO(5-6) and RINFOG(15-16) metrics (#entries=>#bytes)
* C interface: A_ELT/SCHUR/RHS/REDRHS/RHS_loc/SOL_loc may exceed 2^31 entries
* Local Schur (ICNTL(19)=2 or 3) may now exceed 2^31 entries
* Fixed internal dynamic storage of blocks with more than 2^31 entries
* Fixed a bug in the parallel analysis that limited scalability
Changes from 5.2.0 to 5.2.1
* Fixed a minor "Internal error in CMUMPS_DM_FREEALLDYNAMICCB"
* Default value of ICNTL(14) for MPI executions independent
of SYM + slightly less aggressive than for 5.2.0
* Avoided accesses to uninitialized data in symmetric (2D root, BLR)
* Fixed some incorrect "out" intents for routine arguments
* Avoided CHUNK=0 in OMP loops even if loop not parallelized (pgi)
* Fixed COLSCA&ROWSCA declarations in [SDCZ]MUMPS_ANA_F
* Avoided a possible segfault in presence of NaN's in pivot search
* Minor update to userguide
* Fixed MPI_IN_PLACE usage in libseq (preventing compiler optimization)
Changes from 5.1.2 to 5.2.0
* Memory gains due to low-rank factorization are now effective, low-rank solve
* Internal dynamic storage possible in case static workspace too small
* Improved distributed memory usage and MPI granularity (some sym. matrices)
* Improved granularity (and performance) for symmetric matrices; ability to
use [DSCZ]GEMMT kernel (BLAS extension) if available (see INSTALL)
* A-1 functionality: improved performance due to solution gathering
* Memory peak for analysis reduced (distributed-entry, 64-bit orderings)
* Time for analysis reduced by avoiding some preprocessing (when possible)
* More exploitation of RHS sparsity during forward substitution
* Ability to save/restore an instance to/from disk
* INFO and INFOG dimension extended from 40 to 80
* METIS_OPTIONS introduced for METIS users to define some specific Metis options
* MUMPS can be asked to call omp_set_num_threads with a value provided in ICNTL(16)
* Fixed: INFO(16)/INFOG(21)/INFOG(22) did not take into account the extra memory
allocated due to memory allowed (ICNTL(23)>0); INFOG(8) was not correclty set
* Initialize only lower-diagonal part for workers in symmetric type 2 fronts
* Workaround a segfault at beg. of facto due to a gfortran-8 bug
* Fixed a bug in weighted matching algorithm when all matrix values are 0
* Portability: include stdint.h instead of int_types.h
* Forced some initializations to make C interface more valgrind-friendly
* Workaround intel 2017 vectorization bug in pivot search (symmetric+MPI+large matrices)
* Stop trying to send messages on COMM_LOAD in case of error (risk of deadlock)
* Avoided most array creation by compiler due to Fortran pointers
* Avoid two cases of int. overflow (KEEP(66), A-1 with large ICNTL(27))
* Fixed a bug with compressed ordering (ICNTL(12)=2) (regression from 5.0.0)
and suppress compress ordering only in case of automatic setting
Changes from 5.1.1 to 5.1.2
* Corrected an overestimation of memory (regression from 5.1.0)
* Corrected/extended WORKAROUNDINTELILP64MPI2INTEGER mechanism (see INSTALL)
* Parallel analysis: fixed a bug, limited number of MPI processes on small
problems, and reverted to sequential analysis on tiny problems. This is
to avoid erroneous behavior and failures in the parallel ordering tools.
* Faster BLR clustering on matrices with quasi-dense rows (which are skipped)
* Improved performance of solve phase on very small matrices
* Solve phase with a single MPI process is more thread-safe
* Fixed compilation issue with opensolaris ([SDCZ]MUMPS_TRUNCATED_RRQR)
* Fixed minor bug in BLR factorization (uninitialized timer)
* Corrected minor compiler warnings
* Minor correction to userguide
* Add -DBLR_MT in Intel example Makefile
Changes from 5.1.0 to 5.1.1
* Fix in parallel analysis
* Stabilization of 5.1.0:
- Improved stability of Block-Low-Rank feature
- Corrected an incorrect deallocation of POSINRHSCOMP_COL
- Correction of a case of uninitialized data access in type 2 pivoting
- Suppressed occasional debug trace "write(6,*) " KEEP265= ", KEEP265"
Changes from 5.0.2 to 5.1.0
* New feature: selective 64-bit integers (introduced only where needed)
to process matrices with more than 2^{31}-1 entries.
-mixed 32/64 bit integers for API: NNZ/NNZ_loc 64-bit
(NZ/NZ_LOC kept temporarily for backward compatibility)
- both 32 or 64 bit integer versions of external orderings
(Metis/ParMetis, SCOTCH/pt-SCOTCH, PORD), can be used
- Error -51 when a 32-bit external ordering is invoked on
a graph larger than 2^{31}-1
* New feature: (experimental) factorization based on Block-Low-Rank format,
(ICNTL(35) to activate it and CNTL(7) for low-rank precision)
* Improved performance on numerically hard matrices (LU and LDLt)
* "-DALLOW_NON_INIT" flag has disappeared and needs no longer be used
* Fixed incorrect deallocation in case of JOB=3/ICNTL(26)=1 followed by JOB=2
* Fixed compilation problem with Intel2017 + openMPI in [sdcz]ana_aux_par.F
* Minor correction of memory statistics for solve
* Use 64-bit integers where needed during the solve phase to enable
large number of right-hand-sides (NRHS) in one block
(i.e. ICNTL(27)xN can be larger than 2^{31}-1)
* Improved performance of solve phase
* Allow pivoting thresholds CNTL(1) equal to 1.0
* New error -52: when default Fortran integers are 64 bit, external
orderings should also have 64-bit default integers
* New error -22, INFO(2)=16 when IRN_loc or JCN_loc not associated
while ICNTL(18) is set to 3
* Missing O_BINARY flag was added to open binary files on MINGW systems
* New error -53 that could reflect a matrix structure change between
analysis and factorization
Changes from 5.0.1 to 5.0.2
* Suppress error on id%SCHUR_CINTERFACE in mumps_driver.F when bound
check is enabled and when using 2D block cyclic Schur complement
feature (ICNTL(19)=2 or 3) from C or Matlab interfaces
* Problem of failed assertion in [SDCZ]MUMPS_TREAT_DESCBAND solved (static variable
INODE_WAITED_FOR was not initialized and was not detected by valgrind)
* Correction of very minor memory leaks and access to uninitialized data
* A setting of INFO(1)=-1-17 should have been INFO(1)=-17
* Some settings of INFO(1)=-17 should have been INFO(1)=-20
* Suppress absolute tolerance 10^-20 in pivot selection for SYM=2;
skip 2x2 pivot search if only 1 pivot candidate, avoid pivots
that are subnormal numbers (their inverse is equal to infinity)
* Warning +2 now only occurs when solution is really close to 0
* Occasional bug in OOC and multiple instances solved
* Better selection of equations for bwd errors (W1 and W2) and better
forward error estimates on some machines with 80-bit registers
* Improved users' guide (OOC files cleaning, permutation details, usage
of multithreading, clarification of MegaByte unit)
* Cleaning of asynchronous messages after facto/solve was
revisited and is more robust
* More robust suppression of integer overflow risk during
solve for huge ICNTL(23)
* Improved performance of symbolic factorization in case of matrices with
relatively dense rows and/or with large number of Lagrange multipliers
* Improved performance of numerical factorization phase during
pivot search for symmetric indefinite matrices
* Use of -xcore-avx2 requires !DEC$ NOOPTIMIZE in MUMPS_BIT_GET4PROC
with current versions of Intel compilers
* Suppressed some temporary array creation and implicit conversions
Changes from 5.0.0 to 5.0.1
* Iterative refinement convergence check corrected (problem
introduced in 5.0.0)
* Used communicator provided by user instead of MPI_COMM_WORLD
in two places (parallel analysis only)
* Matlab interface patched to avoid memory corruption in some
situations (Schur, colsca/rowsca management)
* Corrected a case of error not properly processed which could
cause a segfault instead of a standard "-9" error, or an
abort on "ERR: ERROR: NBROWS > NBROWF"
* Amalgamation without fill forced for single children
* (rare) segfault related to assemblies of delayed columns in
scalapack root node corrected
* Automatic strategy for ordering choice improved
* Further improvements to userguide (mainly iterative
refinement, error analysis, discard factors and forward
elimination during factorization)
* Error -51 also raised in case of integer overflow during
parallel analysis
Changes from 4.10.0 to 5.0.0
* Userguide revisited
* Compatibility with Metis 5.1.0/ParMetis 4.0.3, and with
SCOTCH/pt-SCOTCH 6.0
* Matlab interface updated (scaling vectors (COLSCA, ROWSCA) and
A-1 feature ICNTL(30) are now available)
* Improved sequential and parallel performance for computing selected entries of
A-1 (ICNTL(30))
* Workspace for solve phase, of size B x N per processor (B: block
size controlled by ICNTL(27)) divided by almost #procs. Default
value of B increased.
* Parallel symmetric indefinite elemental matrices: improved numerical behaviour
* Performance of solve phase improved
* Finer control of error analysis and iterative refinement (ICNTL(11))
* Memory for analysis phase (mapping) reduced.
* Better support for 64-bit integers (see INSTALL file)
* Error raised instead of silent integer overflow during analysis (but
not during external orderings)
* Improvements and corrections to parallel analysis (ICNTL(28)),
deterministic graph construction forced with -DDETERMINISTIC_PARALLEL_GRAPH
* Forward elimination (ICNTL(32)) can be done during factorization
* Possibility to use a workspace (WK_USER, LWK_USER) allocated by user
* Very occasional numerical bug in parallel out-of-core case
corrected (thanks to EDF and Samtech for the validation)
* More efficient processing of sparse right-hand-sides (see ICNTL(20))
* Count for entries in factors now include parallel root node
* Amalgamation of the assembly tree revisited
* Scaling arrays (COLSCA, ROWSCA) also returned at C interface level
* OOC_NB_FILE_TYPE is part of the MUMPS structure, for
a better management of multiple OOC instances
* Warning +2 set only once (could lead to incorrect +4 in
case of iterative refinement + error analysis)
* Warning +4 has disappeared from documentation (since it
was never occurring -- JCN never modified on exit)
* Error code -16 now raised for the case N=0 even on distributed
matrices (thanks to P. Jolivet for noticing this)
* Use BLAS3 routines for efficiency even in case of BLAS2 operations
(-DMUMPS_USE_BLAS2 allows the use of BLAS2 routines for such
operations)
* Message "problem with NIV2_FLOPS message" should no more occur
(there was still an occasional problem in 4.10.0)
* Improved determinant computation (ICNTL(33)) in case of singular
matrix + scaling (where zero pivots are excluded)
* Trace ' PANEL: INIT and force STRAT_IO=' suppressed
* Some OpenMP directives added (multithreaded BLAS still needed)
* Later allocation of strips of distributed fronts with improved locality
* Front factorization algorithms redesigned (two levels of panels)
* Null pivot (ICNTL(24)) and null space detection ICNTL(25)) improved
for unsymmetric matrices
* Fortran automatic arrays (e.g. in mumps_static_mapping.F) suppressed
to avoid risks of stack overflows
* Routine names and filenames changed
Changes from 4.9.2 to 4.10.0
* Modified variable names and variable contents in Make.inc/Makefile*
for Windows (Makefile.inc from an older version needs modifications,
please do a diff)
* Option to discard factors during factorization when
not needed (ICNTL(31))
* Option to compute the determinant (ICNTL(33))
* Experimental "A-1" functionality (ICNTL(30))
* Matlab interface updated for 64-bit machines
* Improved users' guide
* Suppressed a memory leak occurring when Scalapack is used
and user does loops on JOB=6 without JOB=-2/JOB=-1 in-between
* Avoid occasional deadlock with huge values of ICNTL(14)
* Avoid problem of -17 error code during solve phase
* Avoid checking association of pointer arrays ISOL_loc and SOL_loc
on procs with no components of solution (small problems)
* Some data structures were not free at the end of the parallel analysis. Bug fixed.
* Fixed unsafe test of overflow "IF (WFLG+N .LE. WFLG)"
* Large Schur complements sent by blocks if ICNTL(19)=1 (but
options ICNTL(19)=2 or 3 are recommended when Schur complement
is large)
* Corrected problem with sparse RHS + unsymmetric permutation +
transpose solve (problem appeared in 4.9)
* Case where ICNTL(19)=2 or 3 and small value of SIZE_SCHUR
causing problems in parallel solved.
* In case an error is detected, solved occasional problem of
deallocating non-allocated local array PERM.
* Correction in computation of matrix norm in complex arithmetic
(MPI_COMPLEX was used in place of MPI_REAL in MPI_REDUCE)
* Scaling works on singular matrices
* Compilation problem with -i8 solved
* MUMPS_INT used in OOC layer to facilitate compilation with
64 bit integers
Changes from 4.9.1 to 4.9.2
* Compressed orderings (ICNTL(12)=2) are now compatible with PORD
and PT-Scotch
* Mapping problem on large numbers of MPI processes, leading to
INFOG(1)=-135 on "special" matrices solved (problem appeared
in 4.9.1)
Changes from 4.9 to 4.9.1
* Balancing on the processors of both work and memory improved.
In a parallel environment memory consumption should be reduced
and performance improved
* Modification of the amalgamation to solve both the problem of
small root nodes and the problem of tiny nodes implying too many
small MPI messages
* Corrected bug occurring on big-endian environments when passing
a 64-bit integer argument in place of 32-bit one. This was causing
problems in parallel, when ScaLAPACK is used, on IBM machines.
* Internal ERROR 2 in MUMPS_271 now impossible (was
already not happening in practice)
* Solved compiler warnings (or even errors) related to the
order of the declarations of arrays and array sizes
* Parallel analysis: fixed the problem due to the invocation of the size
function on non-allocated pointers, corrected a bug due to initialization
of pointers in the declaration statements, and improved the Makefiles
* Corrected bug in the reallocation of arrays
* Corrected several accesses to uninitialized variables
* Internal Error (4) in OOC (MUMPS_597) no more occurs
* Suppressed possible printing of "Internal WARNING 1 in CMUMPS_274"
* (Minor) numerical pivoting problem in parallel LDLt solved
* Estimated flops corrected when SYM=2 and Scalapack is used (because
we use LU on the root node, not LDLt, in that case)
* Scaling option effectively used is now returned in INFOG(33) and
ICNTL(8) is no more modified by the package
* INFO(25) is now correctly documented, new statistic INFO(27) added
Changes from 4.8.4 to 4.9
* Parallel analysis available
* Use of 64-bit integer addressing for large internal workarrays
* overflow in computation of INFO(9) in out-of-core corrected
* fixed Matlab and Scilab interfaces to sparse RHS functionality
* time cost of analysis reduced for "optimisation" matrices
* time to gather solution on processor 0 reduced and automatic copying
of some routine arguments by some compilers resolved.
* extern "C" added to header file mpi.h of libseq for C++ compilers
* Problem with NZ_loc=0 and scaling with ifort 10 solved
* Statistics about current state of the factorization
produced/printed even in case of error.
* Avoid using complex arrays as real workspace (complex versions)
* New error code -40 (instead of -10) when SYM=1 is used and ScaLAPACK
detects a negative pivot
* Solved problem of "Internal error 1" in [SDCZ]MUMPS_264 and [SDCZ]MUMPS_274
* Solved undeterministic bug occurring with asynchronous OOC + panels
when uninitialized memory access had value -7777
* Fixed a remaining problem with OOC filenames having more than 150 characters
* Fixed some problems related to the usage of intrinsic functions inside PARAMETER
statements (HP-UX compilers)
* Fixed problem of explicit interface in [SDCZ]MUMPS_521
* Out-of-core strategy from 4.7.3 can be reactivated with -DOLD_OOC_NOPANEL
* Message "problem with NIV2_FLOPS message" should no more occur
* Avoid compilation problem with old versions of gfortran
Changes from 4.8.3 to 4.8.4
* Absolute threshold criterion for null pivot detection added to CNTL(3)
* Problems related to messages "Increase small buffer size ..." solved.
* New option for ICNTL(8) to scale matrices. Default scaling cheaper to
compute
* Problem of filename clash with unsymmetric matrices on Windows
platforms solved
* Allow for longer filenames for temporary OOC files
* Strategy to update blocksize during factorization of frontal
matrices modified to avoid too large messages during pipelined
factorization (that could lead to a -17 error code)
* Messages corresponding to delayed pivots can now be sent
in several packets. This avoids some other cases of error -17
* One rare case of deadlock solved
* Corrected values and sign of INFO(8) and INFO(20)
Changes from 4.8.2 to 4.8.3
* Fix compilation issues on Windows platforms
* Fix ranlib issue with libseq on MacOSX platforms
* Fix a few problems of uninitialized variables
Changes from 4.8.1 to 4.8.2
* Problem of wrong argument in the call to [sdcz]mumps_246 solved
* Limit occurrence of error -11 in the in-core case
* Problem with the use of SIZE on an unassociated pointer solved
* Problem with distributed solution combined with non-working host solved
* Fix generation of MM matrices
* Fix of a minor bug in OOC error management
* Fix portability issues on usleep
Changes from 4.8.0 to 4.8.1
* New distributed scaling is now on by default for distributed matrices
* Error management corrected in case of 32-bit overflow during factorization
* SEPARATOR is now defined as "\\" in Windows version
* Bug fix in OOC panel version
Changes from 4.7.3 to 4.8.0
* Parallel scalings algorithms available
* Possibility to dump a matrix in matrix-market format from both
C and Fortran interfaces
* Correction when dumping a distributed matrix in matrix-market format
* Minor numerical stability problem in some LDL^t parallel
factorizations corrected.
* Memory usage significantly reduced in both parallel and sequential
(limit communication buffers, in-place assembly for assembled matrices,
overlapping during stack).
* Better alignment properties of mumps_struc.h
* Reduced time for static mapping during the analysis phase.
* Correction in dynamic scheduler
* "Internal error 2 in DMUMPS_26" no more occurs, even if SIZE_SCHUR=0
* Corrections in the management of ICNTL(25), some useful code was
protected with -Dtry_null_space and not compiled.
* Scaling arrays are now declared real even in complex versions
* Out-of-core functionality storing factors on disk
* Possibility to tell MUMPS how much memory the package is allowed
to allocate (ICNTL(23))
* Estimated and effective number of entries in factors returned to user
* API change: MAXS and MAXIS have disappeared from the interface,
please use ICNTL(14) and ICNTL(23) to control the memory usage
* Error code -11 raised less often, especially in out-of-core executions
* Error code -14 should no more occur
* Memory used at the solve phase is now returned to the user
* Possibility to control the blocking size for multiple right-hand sides
(strong impact on performance, in particular for out-of-core executions)
* Solved problems of 32-bit integer overflows during analysis related
to memory estimations.
* New error code -37 related to integer overflows during
factorization
* Compile one single arithmetic with make s, make d, make c or make z,
examples are now in examples/, test/ has disappeared.
* Arithmetic-independent parts are isolated into a libmumps_common.a, that
must now be linked too (see examples/Makefile).
Changes from 4.7.2 to 4.7.3
* detection of null pivots for unsymmetric matrices corrected
* improved pivoting in parallel symmetric solver
* possible problem when Schur on and out-of-core : Schur was splitted
* type of parameters of intrinsic function MAX not compatible in
single precision arithmetic versions.
* minor changes for Windows
* correction with reduced RHS functionality in parallel case
Changes from 4.7.1 to 4.7.2
* negative loads suppressed in mumps distribution
Changes from 4.7 to 4.7.1
* Release number in Fortran interface corrected
* "Negative load !!" message replaced by a warning
Changes from 4.6.4 to 4.7
* New functionality: build reduced RHS / use partial solution
* New functionality: detection of zero pivots
* Memory reduced (especially communication buffers)
* Problem of integer overflow "MEMORY_SENT" corrected
* Error code -20 used when receive buffer too small
(instead of -17 in some cases)
* Erroneous memory access with singular matrices (since 4.6.3) corrected
* Minor bug correction in hybrid scheduler
* Parallel solution step uses less memory
* Performance and memory usage of solution step improved
* String containing the version number now available as a
component of the MUMPS structure
* Case of error "-9964" has disappeared
Changes from 4.6.3 to 4.6.4
* Avoid name clashes (F_INT, ...) when C interface is used and
user wants to include, say, smumps_c.h, zmumps_c.h (etc.) at
the same time
* Avoid large array copies (by some compilers) in distributed
matrix entry functionality
* Default ordering less dependent on number of processors
* New garbage collector for contribution blocks
* Original matrix in "arrowhead form" on candidate processors
only (assembled case)
* Corrected bug occurring rarely, on large number of
processors, and that depended on value of uninitialized
data
* Parallel LDL^t factorization numerically improved
* Less memory allocation in mapping phase (in some cases)
Changes from 4.6.2 to 4.6.3
* Reduced memory usage for symmetric matrices (compressed CB)
* Reduced memory allocation for parallel executions
* Scheduler parameters for parallel executions modified
* Memory estimates (that were too large) corrected with
2Dcyclic Schur complement option
* Portability improved (C/Fortran interfacing for strings)
* The situation leading to Warning "RHS associated in MUMPS_301"
no more occurs.
* Parameters INFO/RINFO from the Scilab/Matlab API are now called
INFOG/RINFOG in order to match the MUMPS user's guide.
Changes from 4.6.1 to 4.6.2
* Metis ordering now available with Schur option
* Schur functionality correctly working with Scilab interface
* Occasional SIGBUS problem on single precision versions corrected
Changes from 4.6 to 4.6.1
* Problem with hybrid scheduler and elemental matrix entry corrected
* Improved numerical processing of symmetric matrices with quasi-dense rows
* Better use of Blacs/Scalapack on processor grids smaller than MPI_COMM_WORLD
* Block sizes improved for large symmetric matrices
Changes from 4.5.6 to 4.6
* Official release with Scilab and Matlab interfaces available
* Correction in 2x2 pivots for symmetric indefinite complex matrices
* New hybrid scheduler active by default
Changes from 4.5.5 to 4.5.6
* Preliminary developments for an out-of-core code (not yet available)
* Improvement in parallel symmetric indefinite solver
* Preliminary distribution of a SCILAB and a MATLAB interface
to MUMPS.
Changes from 4.5.4 to 4.5.5
* Improved tree management
* Improved weighted matching preprocessing:
duplicates allowed, overflow avoided, dense rows
* Improved strategy for selecting default ordering
* Improved node amalgamation
Changes from 4.5.3 to 4.5.4
* Double complex version no more depends on
double precision version.
* Simplification of some complex indirections in
mumps_cv.F that were causing difficultiels to
some compilers.
Changes from 4.5.2 to 4.5.3
* Correction of a minor problem leading to
INFO(1)=-135 in some cases.
Changes from 4.5.1 to 4.5.2
* correction of two uninitialized variables in
proportional mapping
Changes from 4.5.0 to 4.5.1
* better management of contribution messages
* minor modifications in symmetric preprocessing step
Changes from 4.4.0 to 4.5.0
* improved numerical features for symmetric indefinite matrices
- two-by-two pivots
- symmetric scaling
- ordering based on compressed graph preserving two by two pivots
- constrained ordering
* 2D cyclic Schur better validated
* problems resulting from automatic array copies done by compiler corrected
* reduced memory requirement for maximum transversal features
Changes from 4.3.4 to 4.4.0
* 2D block cyclic Schur complement matrix
* symmetric indefinite matrices better handled
* Right-hand side vectors can be sparse
* Solution can be kept distributed on the processors
* Metis allowed for element-entry
* Parallel performance and memory usage improved:
- load is updated more often for type 2 nodes
- scheduling under memory constraints
- reduced message sizes in symmetric case
- some linear searches avoided when sending contributions
* Avoid array copies in the call to the partial mapping routine
(candidates); such copies appeared with intel compiler version 8.0.
* Workaround MPI_AllReduce problem with booleans if mpich
and MUMPS are compiled with different compilers
* Reduced message sizes for CB blocks in symmetric case
* Various minor improvements
Changes from 4.3.3 to 4.3.4
* Copies of some large CB blocks suppressed
in local assemblies from child to parent
* gathering of solution optimized in solve phase
Changes from 4.3.2 to 4.3.3
* Control parameters of symbolic factorization modified.
* Global distribution time and arrowheads computation
slightly optimized.
* Multiple Right-Hand-Side implemented.
Changes from 4.3.1 to 4.3.2
* Thresholds for symbolic factorization modified.
* Use merge sort for candidates (faster)
* User's communicator copied when entering MUMPS
* Code to free CB areas factorized in various places
* One array suppressed in solve phase
Changes from 4.3 to 4.3.1
* Memory leaks in PORD corrected
* Minor compilation problem on T3E solved
* Avoid taking into account absolute criterion
CNTL(3) for partial LDLt factorization when whole
column is known (relative stability is enough).
* Symbol MPI_WTICK removed from mpif.h
* Bug wrt inertia computation INFOG(12) corrected
Changes from 4.2beta to 4.3
* C INTERFACE CHANGE: comm_fortran must be defined
from the calling program, since MUMPS uses a Fortran
communicator (see user guide).
* LAPACK library is no more required
* User guide improved
* Default ordering changed
* Return number of negative diagonal elements in LDLt
factorization (except for root node if treated in parallel)
* Rank-revealing options no more available by default
* Improved parallel performance
- new incremental mechanism for load information
- new communicator dedicated to load information
- improved candidate strategy
- improved management of SMP platforms
* Include files can be used in both free and fixed forms
* Bug fixes:
- some uninitialized values
- pbs with size of data on t3e
- minor problems corrected with distributed matrix entry
- count of negative pivots corrected
- AMD for element entries
- symbolic factorization
- memory leak in tree reordering and in solve step
* Solve step uses less memory (and should be more efficient)
Changes from 4.1.6 to 4.2beta
* More precisions available (single, double, complex, double complex).
* Uniprocessor version available (doesn't require MPI installed)
* Interface changes (Users of MUMPS 4.1.6 will have to slightly
modify their codes):
- MUMPS -> ZMUMPS, CMUMPS, SMUMPS, DMUMPS depending the precision
- the Schur complement matrix should now be allocated by the
user before the call to MUMPS
- NEW: C interface available.
- ICNTL(6)=6 in 4.1.6 (automatic choice) is now ICNTL(6)=7 in 4.2
* Tighter integration of new ordering packages (for assembled matrices),
see the description of ICNTL(7):
- AMF,
- Metis,
- PORD,
* Memory usage decreased and memory scalability improved.
* Problem when using multiple instances solved.
* Various improvments and bug fixes.
Changes from 4.1.4 to 4.1.6
* Modifications/Tuning done by P.Amestoy during his
visit at NERSC.
* Additional memory and communication statistics.
* minor pbs solved.
Changes from 4.0.4 to 4.1.4
* Tuning on Cray T3e (and minor debugging)
* Improved strategy for asynchronous
communications
(irecv during factorization)
* Improved Dynamic scheduling
and splitting strategies
* New maximal transversal strategies
* New Option (default) automatic decision
for scaling and maximum transversal
-------------------
= Release history =
-------------------
Release 5.7.0 : April 2024
Release 5.6.2 : October 2023
Release 5.6.1 : July 2023
Release 5.6.0 : April 2023
Release 5.5.1 : July 2022
Release 5.5.0 : April 2022
Release 5.4.1 : August 2021
Release 5.4.0 : April 2021
Release 5.3.5 : October 2020
Release 5.3.4 : September 2020
Release 5.3.3 : June 2020
Release 5.3.2[.x] : May 2020, internal/experimental
Release 5.3.1 : April 2020
Release 5.3.0 : April 2020
Release 5.2.1 : June 2019
Release 5.2.0 : April 2019
Release 5.1.2 : October 2017
Release 5.1.1 : March 2017
Release 5.1.0 : Feb 2017, internal release (limited diffusion)
Release 5.0.2 : July 2016
Release 5.0.1 : July 2015
Release 5.0.0 : February 2015
Release 4.10.0 : May 2011
Release 4.9.2 : November 2009
Release 4.9.1 : October 2009
Release 4.9 : July 2009
Release 4.8.4 : December 2008
Release 4.8.3 : September 2008
Release 4.8.2 : September 2008
Release 4.8.1 : August 2008
Release 4.8.0 : July 2008
Release 4.7.3 : May 2007
Release 4.7.2 : April 2007
Release 4.7.1 : April 2007
Release 4.7 : April 2007
Release 4.6.4 : January 2007
Release 4.6.3 : June 2006
Release 4.6.2 : April 2006
Release 4.6.1 : February 2006
Release 4.6 : January 2006
Release 4.5.6 : December 2005, internal release
Release 4.5.5 : October 2005
Release 4.5.4 : September 2005
Release 4.5.3 : September 2005
Release 4.5.2 : September 2005
Release 4.5.1 : September 2005
Release 4.5.0 : July 2005
Releases 4.3.3 -- 4.4.3 : internal releases
Release 4.3.2 : November 2003
Release 4.3.1 : October 2003
Release 4.3 : July 2003
Release 4.2 (beta) : December 2002
Release 4.1.6 : March 2000
Release 4.0.4 : Wed Sept 22, 1999 <-- Final version from PARASOL