pdb file format help?

Nov 02, 2010 16:35

I tried a google search but can't find anything useful...so maybe one of you knows ( Read more... )

Leave a comment

petefred November 3 2010, 00:20:53 UTC
Not sure if one can get excel to output a fixed width format (blame the fortran people for PDBs looking like that).
Anyway...

This incantation should work in the terminal, assuming that you have a comma separated file in file.csv and have all of the appropriate fields there.

awk -F "," '{printf "ATOM %5i %4s %3s %1s%4s %8.3f%8.3f%8.3f%6.2f%6.2f \n", $2, $3, $4, $5, $6, $7, $8, $9, $10,$11}' infile.csv > outfile.pdb

(replacing infile.csv and outfile.pdb with your input file and desired output file names, respectively. I'm not sure how well it'll show up after LJ mangles it -- you need to make sure that that's all one line)

This assumes that your input file looks something like
ATOM,1,N,SER,A,2,19.139,23.246,22.51,1,29.99,N
ATOM,2,CA,SER,A,2,19.593,24.58,22.983,1,29.64,C
ATOM,3,C,SER,A,2,18.9,24.918,24.285,1,29.18,C
ATOM,4,O,SER,A,2,18.406,24.026,24.972,1,29.16,O
ATOM,5,CB,SER,A,2,19.251,25.603,21.93,1,30.15,C
ATOM,6,OG,SER,A,2,17.886,25.445,21.603,1,32.87,O
ATOM,7,N,VAL,A,3,18.907,26.197,24.657,1,28.66,N

And in particular, that you have all the same fields as my example and nothing else. It also doesn't handle header lines.

If you find yourself having to do this sort of manipulation a lot, it might be better to do it in a program like vmd that is aware of what all the fields are and how to read/write pdbs properly.

Reply

petefred November 3 2010, 00:26:24 UTC
(oops, I started writing that and forgot that you said tab-delimited; replacing awk -F "," with awk -F "\t" will work for a tab delimited file.

Reply

tt6681_theresat November 3 2010, 00:40:08 UTC
Ok, that *almost* works... unfortunately I'm not at all familiar with the code to fix it (but hey, at least now I can navigate to the right folders and stuff on command line... I might be learning slowly but I am learning)... it returns the first line of my file in the perfect format...but just the first line.
Any ideas?

(I just exported it as a .csv and used the first option)

Reply

petefred November 3 2010, 00:51:23 UTC
Hm, can you just email me the file or post the first couple lines? I'm not sure what "the first option" is -- don't have excel.

Reply

caethan November 3 2010, 01:07:48 UTC
Try pasting a couple lines at http://pastebin.com/ and send us the link.

Reply

petefred November 3 2010, 01:34:34 UTC
Ooh, nifty utility -- thanks Brett!

Reply

tt6681_theresat November 3 2010, 01:30:43 UTC
HETATM,6064,O,HOH,X,26,-15.059,-10.648,19.926,1,40.41,O
HETATM,6065,O,HOH,X,39,-20.863,9.36,17.53,1,40.88,O
HETATM,6066,O,HOH,X,44,23.616,-6.229,27.799,1,25.37,O
HETATM,6067,O,HOH,X,60,-5.724,11.376,29.546,1,15.78,O
HETATM,6068,O,HOH,X,62,31.982,11.285,27.999,1,38.97,O
HETATM,6069,O,HOH,X,64,-15.31,-9.016,21.564,1,25.28,O
HETATM,6070,O,HOH,X,65,0.694,2.473,25.056,1,45.36,O

Reply

petefred November 3 2010, 01:35:38 UTC
Could you try pastebin? It works fine for me when I copy that text into a file and run my little one-liner on it, so I'm concerned there may be some obscure formatting that's being removed by LJ. Or give Brett's python script a try...

Reply

tt6681_theresat November 3 2010, 01:41:36 UTC
pastebin is cool! (thanks Brett). here is what I tried.

Reply

petefred November 3 2010, 01:53:06 UTC
For the record, I missed a couple spaces -- it should be
awk -F "," '{printf "ATOM %5i %4s %3s %1s%4s %8.3f%8.3f%8.3f%6.2f%6.2f \n", $2, $3, $4, $5, $6, $7, $8, $9, $10, $11}' plaintry.csv > waters_Petermethod.pdb

to get a proper format.

I tried copying your pastebin text into a file and it again gives me the complete, correct output.

Terminal vs. xterm should make no difference. The only thing I can think of, not having a mac handy, is that the way in which you're getting the text into the csv file may be leaving an odd line ending -- macs are funny that way. Maybe try at the terminal:

cat > infile.csv << EOF
(then paste your text)
EOF(press enter)

and then run my command on infile.csv.

Sorry this is turning into a hassle...

Reply

tt6681_theresat November 3 2010, 02:06:10 UTC
Hold on. That worked. Just copying and pasting what you wrote. But ONLY copying and pasting it...which makes no sense to me. As in, if I use it once, and then hit the 'up arrow' to repeat the same command, change either the input file or just the output file name, it all of a sudden doesn't work and gives me back the single line (first line) of the file... so confused now. I'll try those other things and let you know (first attempts failed).

Reply

tt6681_theresat November 3 2010, 02:10:52 UTC
HaHa! Got it to work. Did your EOF thing. Typed the command from scratch into X11 window (since I don't know how to paste). And this time, it actually generated a file with all of the lines in it. And no, I don't know what made it magically work. (The computer sensed I was going to throw it against the wall?)

THANK YOU for all of your help & patience!!!

Reply

petefred November 3 2010, 02:37:54 UTC
No problem; glad it worked.
You shouldn't always need to get the text into the file that way -- there ought to be an option in the excel file export to csv that says something like "Line ending", and has options like CR, LF, CR/LF. You want LF.
That's just my guess on what's wrong at this point, but it is consistent with the symptoms.

Reply

petefred November 3 2010, 02:42:43 UTC
Dammit, LJ is stripping the extra spaces, which is why the format is still wrong. http://pastebin.com/V5m4xJbM has the right spacing for a standards-compliant pdb.

Reply

tt6681_theresat November 3 2010, 01:35:41 UTC
by first option i meant your first line of code - the one for .csv instead of tab-delimited. nothing in excel.

Reply


Leave a comment

Up