AUTOMATIC PARTICLE PICKING FROM ELECTRON MICROGRAPHS Automatic particle picking from electron micrographs is based on textural methods. It consists of three basic steps which are: PREPARATION TRAINING SELECTION In the PREPARATION part of the program, the preprocessing is done which basically involves windowing out the micrograph into dimensions divisible by 4, inversing the contrast to have bright particles on dark background (in the case of negative stain pictures you do not have to worry), reducing the micrograph four-fold,(which makes the program go faster), low-pass filtering it with a gaussian function and performing a peak search routine to obtain peaks above a certain threshold. In the TRAINING part of the program, there are three important steps.(i) creation of small images of data windows from the original unreduced micrograph using the document file obtained from the peak search routine above. This is done by the command "AT WN" (ii) manual selection and initial assignment of these data windows into three categories (1: particle, 2: noise, and 3: junk) using the CATEGORIZE option in WEB(iii) feature evaluation and discriminant analysis using the "AT SA" command which creates the discriminant function which forms the hypothesis for future selections of particles. In the SELECTION part of the program one uses the command "AT WN" command again but this time we use the discriminant function created above in the TRAINING part to window out only the genuine particles as decided by the function. PRACTICAL CONSIDERATIONS NOTE: 1. Before reducing it four-fold important thing to note is that the input micrograph should have dimensions which are multiples of four i.e., 3200 but not 3201.In the event of the input micrograph having dimensions of 3201 or so, window it out to dimensions four. The reason for windowing it out to dimensions of four is that when one reduces by a factor of four, the command "IP" adds sixteen neighbouring pixels so that the SNR is high. 2. For the low-pass gaussian filtration a guideline to choosing the filter radius used is calculated by the formula: 1/(pi*size) where size refers to the approximate size of the particle in the reduced image. For example in the case of 70S ribosome: particle size is about 250 Angstroms. If the pixel size is 5 Angstroms, then the number of pixels occupied by the particle is 250/5 = 50 pixels. In the reduced image (reduced by four times) the particle size is about: 50/4 = 12.5 pixels. So: 1/pi*size = 0.025 The radius chosen as such for the filtration is typically too strong. But it is a starting point. Eventually we used a filter radius of 0.05. IT looks like this radius should work for all practical purposes. For the gaussian filtration, one needs to use "FQ NP" command which performs filtration in fourier space and involves the mixed radix fourier transforms. ONe just needs to check that the dimensions of the image to be filtered are allowed for mixed radix F.T. (check FT MR to determine what dimensions are not allowed). 3. The peak search routine is called "AT PK". It does a peak search on the filtered image over a specified neighborhood. i.e., it checks for peaks over a region which is what would be occupied by the particle. This helps in avoiding selection of the same particle more than once and also somewhat in picking two particles that might be too close together. Since the filtered image is now the reduced image, the approximate size of the particle calculated is about 12.5 pixels (from 2). For the peak search the neighborhood chosen is an ODD NUMBER and so we could use the nearest higher number which is 13 pixels. When one is using the peak search routine to obtain peaks (which shall be eventually windowed out by the ATWN program to obtain data windows) to be used in the TRAINING part of the program, it is convenient to have a low threshold value (which is the cut-off value and only peaks with values above it are selected) something like:0.6 or 0.65.If one chooses a decent threshold like 0.8 or 0.75 then there are not enough noise data windows and the discriminant program might crash.Thus one ensures that enough particles, noise and junk (clumps or stains, or blobs) data windows are available for analysis by the discriminant program. The discriminant program will crash if you have too little data windows of one group. In the second run or SELECTION part of the program it is convenient to use threshold values of something like 0.7 or 0.75 or even 0.8.This way one can avoid the unnecessary "noise" windows. 4. "AT SA" command is the one which evaluates the features for the different data windows belonging to the different categories i.e., particles, noise, and junk. It inputs them as feature vectors into the discriminant analysis program which casts all the relevant information into a discriminant function which is used for future classifications. Currently, the program assumes that the categories assigned from CATEGORIZE in WEB fall into the following groups i.e., category "1" is particle category "2" is noise category "3" is junk The AT SA command creates the discriminant function called "DISCRIM" which is the one used for future classifications. It reclassifies the input images assigned by the user by using the DISCRIM function that it has created and lets you know how it performs like how efficient it is. THis lets you make any changes that you might want to make in your initial asssignments. 5. "AT WN" is used as part of the SELECTION part as well as the PREPARATION part of the program. In the preparation part of the program it just uses the peaks obtained from ATPK (with a low threshold value) and windows them out (all of them) to obtain data windows to be processed by ATSA. AT WN multiplies the coordinates from the document file by A MULTIPLICATION FACTOR(since the image is reduced by a factor of four) before windowing them out. In the selection part of the program, it basically asks for the document file created from the peak search routine (with a higher threshold value) and windows out only the genuine particles as decided by discriminant function. Remember to do the peak search for the filtered image using a higher threshold like 0.7 or 0.75 or even 0.8 (during the selection part) just to exclude some "noise" windows at that stage itself. Comments: The particle picking program seems to be quite efficient in excluding "noise" windows from the final selection. As for "junk" one needs to make a compromise i.e., If one has a lot of stains or blobs (in the case of frozen-hydrated specimens) in the "junk" category, then the program is very efficient in removing these from the final selection for the micrographs. If one has more or less equal number of stains or blobs and some particle aggregates or clumps then, the program tends to assign some of the clumps back into the particle category and at times it messes up. Personally, I think the program does a wonderful job if one has only stains or blobs in the "junk" category as one can decide for himself/herself later on (inevitably one has to do some manual screening after all the particles are windowed out to make sure that the particles selected meet the requirements to perform a 3D reconstruction) whether they consider a data window to contain a single particle or a clump. The program runs pretty quickly. It takes more time only the first time when the discriminant function has to be created but in subsequent runs when one only wants to window out particles, it runs pretty quickly. *********************************************************************** NEW SPIDER COMMANDS INVOLVED RELATED TO AUTOMATIC PARTICLE PICKING ARE: AT PK AT SA AT WN AT IT EXAMPLES ************************************************************************ ****** PREPARATION PART OF THE PROGRAM IN SPIDER ****** ;window it to dimensions of 4 ; wi /net/ithaca/usr4/agrawal/cont/mic/mic001 _2 3024,3884 1,1 ;contrast inversion to have bright particles on dark background ; ar _2 mic001 ((p1+0.)*-1.) ;interpolate down four-fold ;For an output picture with dimensions exactly four times ;smaller than those of an input picture the adding of sixteen ;neighbouring pixels is done. ; ip mic001 _3 756,971 ;ftmr does not work for certain dimensions ; check in ftmr help ; wi _3 _4 756,968 1,1 ;low-pass gaussian filtration is done ; fq np _4 fl001 3 0.05 en ***************************************************************** ****** TRAINING PART OF THE PROGRAM IN SPIDER ****** ; fl001 is the filtered image obtained in the preparation part of the program (above) ; doc001 is the document file which the program creates which will contain the peaks ; obtained in the reduced image. ; 13 corresponds to the number of neighborhood pixels that you want the program to search ; for peaks (How to obtain it?? check above) ; 0.65 corresponds to the threshold that one needs to use as cut-off value. IN the training ; part of the program it is desirable to use a low threshold value (check above). ; ; ; ; PEAK SEARCH ; AT PK fl001 13 doc001 0.65 ; ; ; ; INITIAL WINDOWING FOR MANUAL ASSIGNMENT OF CATEGORIES BY CATEGORIZE OPTION IN WEB ; ;mic001 is the input micrograph (from batch job above) which has bright particles ;on dark background. You answer yes (y) when it asks you if you want to create windows ; and then quit. ; doc001 is the document file with peaks from above. ; 75,75 is the dimension of the data window that you want to have for the windowed particles. ; init corresponds to the directory where you want to put your created windows which are called ; as prt**** ; ; AT WN mic001 y 75,75 doc001 init/prt**** ; ; ; ; One needs to use CATEGORIZE option in WEB now which displays the windowed images above as ; a montage and then you can assign categories to the windows: ; REMEMBER: 1: Particle 2: Noise and 3: Junk ; The CATEGORIZE option creates a document file which contins two columns after the key ; the first column contains the image number and the second column contains the category. ; In the process of assigning categories to the images one might assign some number to a ; particular image but later on decide to change it. So in that event, one can always go ; go and assign a new category to that image because the following program AT IT takes care ; of such instances. It keeps in record only the last assignment given to the image. Moreover, ; one might tend to assign a category to image #500 first and then to image # 436 and so on in ; a haphazard fashion. This is also sorted by the following program which eventually writes out ; a document file which will put all unique images selected by CATEGORIZE in a sequential ; ascending order and also makes the key (column 0) in a sequential order which is not done ; by CATEGORIZE. ; ; ; doc1 is the file created by CATEGORIZE option. ; AT IT writes the output into file sel001 AT IT doc1 sel001 ; ; ; ; FEATURE EVALUATION AND DISCRIMINANT ANALYSIS ; ;ATSA command requires the input data windows which have already been assigned with categories. ;It also requires the category selection file obtained from ATIT and then it writes the output ; results to a file, for which a name has to be supplied. THis results file basically gives you ;a lot of information regarding how the program performs when it does a reclassification of the ; already assigned data windows and so one can always add more or remove some windows to/from a ;particular group. ATSA puts the discriminant function in a file which can be used for future purposes. ;In the ATWN command this function file should be input. ; AT SA init/prt**** 5,5 sel001 analysis discrim ; ; ; AT PK fl001 13 doc002 0.7 ; ; AT WN mic001 n 75,75 doc002 prt/prt**** doccoord1 5,5 discrim Ref: "Automatic Particle Picking from Electron Micrograph" by K.Ramani Lata, P. Penczek and J. Frank.