Skip to content

Processing LPA6 data (5XSZ)

keitaroyam edited this page Jan 23, 2018 · 7 revisions

The following describes how LPA6 datasets can be processed using KAMO (documentation in Japanese / English).

References

  • Original paper
    • Taniguchi et al. (2017) "Structural insights into ligand recognition by the lysophosphatidic acid receptor LPA6." Nature doi: 10.1038/nature23448 PDB: 5XSZ

Raw data

  • Available in Zenodo. DOI
  • Collected on BL32XU, SPring-8
  • EIGER X 9M detector, mostly 15×9 μm2 beam, 1 Å wavelength, 260 or 300 mm camera length
  • 4° or 6°/dataset, 1°/frame (shutterless)
  • 397 datasets collected manually from 44 cryoloops
  • P212121; a= 55.9, b= 65.0, c= 160.7 Å

How data were processed in the original paper

GUI command 'kamo' was used by default parameters, that is, XDS (ver. May 1, 2016 BUILT=20160617) was used for integration and no prior crystal information was employed. Note that flatfield correction was not applied to these images and you need a modified version of eiger2cbf that applies the correction or something equivalent.

359 out of 397 datasets were indexed and integrated, and 350 datasets belonged to the largest group of consistent unit cells:

[ 1] 350 members:
 Averaged P1 Cell= 55.89 64.95 160.55 90.51 90.52 90.43
 Possible symmetries:
   freq symmetry     a      b      c     alpha  beta   gamma reindex
    196 P 1         55.89  64.95 160.55  90.51  90.52  90.43 a,b,c
     26 P 1 2 1     55.89  64.95 160.55  90.51  90.52  90.43 a,b,c
     29 P 1 2 1     55.89 160.55  64.95  89.49  90.43  89.48 a,-c,b
     36 P 1 2 1     64.95  55.89 160.55  89.48  90.51  89.57 b,-a,c
      0 C 1 2 1     86.01  85.36 160.55  89.27  90.05  98.58 -a+b,-a-b,c
      0 C 1 2 1     85.36  86.01 160.55  90.05  90.73  81.42 a+b,-a+b,c
     63 P 2 2 2     55.89  64.95 160.55  90.51  90.52  90.43 a,b,c
      0 C 2 2 2     85.36  86.01 160.55  90.05  90.73  81.42 a+b,-a+b,c
      0 P 4         55.89  64.95 160.55  90.51  90.52  90.43 a,b,c
      0 P 4 2 2     55.89  64.95 160.55  90.51  90.52  90.43 a,b,c

As P222 symmetry was the most frequent one except P1, P222 was assumed and the XDS_ASCII files were re-indexed to P222 symmetry.

To remove outliers having extremely different unit cell parameters, filter_cell.R was used and 49 datasets were removed.

Next, several clustering procedures were tested including BLEND, CC of intensity, and CC of normalized intensity (|E|2). A subcluster (the second largest cluster) in clustering result by CC(|E|2) was found to be the best result (having the largest overall CC1/2). This result consisted of 241 datasets and was found in ccc_3.2A_framecc_b3/cluster_0309/run_03:

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     9.56       13708     443       452       98.0%      19.5%     17.7%    13707   23.39     19.8%    99.6*    19    1.086     258
     6.78       23584     739       739      100.0%      26.1%     20.4%    23584   18.45     26.5%    99.3*    17*   1.253     537
     5.54       26771     914       914      100.0%      43.1%     44.2%    26771    8.78     43.8%    97.1*    13    0.903     720
     4.80       34801    1064      1064      100.0%      45.7%     43.7%    34801    8.91     46.5%    97.4*     7    0.915     867
     4.29       40631    1197      1197      100.0%      54.1%     55.2%    40631    7.54     55.0%    96.3*    12*   0.931    1001
     3.92       45332    1310      1310      100.0%      70.9%    100.2%    45332    4.64     71.9%    93.3*    11*   0.705    1115
     3.63       47175    1412      1412      100.0%     104.4%    182.7%    47175    2.70    106.0%    89.4*    13*   0.575    1222
     3.39       52228    1572      1572      100.0%     170.4%    373.8%    52228    1.47    173.0%    84.5*     7    0.464    1360
     3.20       50595    1582      1582      100.0%     356.5%    888.4%    50595    0.68    362.2%    58.8*     6    0.382    1389
    total      334825   10233     10242       99.9%      56.0%     82.9%   334824    6.24     56.8%    99.0*    12*   0.706    8469

This result was produced by the following command:

#!/bin/sh
# settings
dmin=3.2 # resolution
clustering_dmin=3.5  # resolution for CC calculation
anomalous=false # true or false
lstin=formerge_goodcell.lst # list of XDS_ASCII.HKL files
use_ramdisk=true # set false if there is few memory or few space in /tmp
# _______/setting

kamo.multi_merge \
        workdir=ccc_${dmin}A_framecc_b3 \
        lstin=${lstin} d_min=${dmin} anomalous=${anomalous} \
        program=xscale xscale.reference=bmin \
        reject_method=framecc+lpstats rejection.lpstats.stats=em.b \
        clustering=cc cc_clustering.d_min=${clustering_dmin} cc_clustering.b_scale=false cc_clustering.use_normalized=true \
        cc_clustering.min_cmpl=90 cc_clustering.min_redun=2 \
        xscale.use_tmpdir_if_available=${use_ramdisk} \
        batch.engine=sge batch.par_run=merging batch.nproc_each=8 nproc=8 batch.sge_pe_name=par