Copyright: © EMBL-EBI 2009
This FAQ is split into 2 sections: one mainly useful to administrators, and one for users. Please also make sure you have read the README and Installation Instructions.
If you still haven't found a solution to your problem, please contact interhelp@ebi.ac.uk
What is InterProScan?
InterProScan is a tool that combines different protein signature recognition methods native to the InterPro member databases into one resource with look up of corresponding InterPro and GO annotation.
See & cite: Zdobnov E.M. and Apweiler R. "InterProScan - an integration platform for the signature-recognition methods in InterPro" Bioinformatics, 2001, 17(9): p. 847-8.
InterPro documentation available at: http://www.ebi.ac.uk/interpro/
What are the terms for commercial companies for using InterProScan?
As stated in the InterPro documentation, the manual and database may be copied and redistributed freely, without advance permission, provided that this Copyright statement is reproduced with each copy. The InterProScan software is distributed under the GNU license, as are the included scanning tools (except SignalP and TMHMM, see later). Therefore, you do not need a special license for commercial use but please cite the resource and keep the Copyright statement with your installation
InterPro - Integrated Resource Of Protein Domains And Functional Sites
Copyright © 2001 The InterPro Consortium.
If we use your InterProscan web server, what is your policy regarding confidentiality of protein sequences that we submit?
We cannot necessarily guarantee confidentiality of your sequences submitted via the web server, and we suggest you install a local copy of the software at your site. Everything you need is available from the FTP site (ftp://ftp.ebi.ac.uk/pub/databases/interpro/) in the iprscan directory.
Is there a limit to the number of sequences I can search using InterProScan?
-OR-
How can I run more than 1 sequence through InterProScan?
We have restricted the number of sequences that can be submitted via our web interface per run to one because InterProScan is found to run much faster on single sequences. So, if you wish to submit multiple sequences through the web interface, you will have to do it one at a time.
We had to make this restriction because it was consuming too many resources. (Please note that there is no change in the overall limitation of the number of jobs you can run at one time, just the limitation in the number of sequences you can submit in a single job). You will find, in the end, that you actually get your result much quicker.
We provide other alternatives. You can run your sequences via our InterProScan web service. You can still only submit one at a time but you will be able put the submission commands into a script, so it will still appear to you like you are running multiple sequences at a time.
Please see http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html for more information.
Alternatively, you can download the standalone version to your local servers and run it there. There are no restrictions on the number of sequences that can be run on this system but please note that it is a very computationally demanding program and you should ensure that your system fulfills the requirements listed in the documentation on the ftp site: ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/index.html
Is there a limit to the number of sequences I can search using InterProScan?
Unlike the EBI-web version, where you can only search single sequences, there is no limit to the number of sequences you can search using the stand-alone version of InterProScan. It is optimised so that large batch jobs can be "chunked" into smaller pieces and the searches parallelised.
Is there a maximum length for a sequence I can search using InterProScan?
We haven't tested whether there is a maximum length that a sequence can be when running InterProScan, although some users have reported problems which may be as a result of very long sequences. This is something we will look into in further detail in the near future.
Can I run DNA as well as protein sequences through InterProScan?
Due to resource limitations neither the EBI web interface nor web services for InterProScan will be accepting nucleotide sequence submissions until further notice.
To process nucleotide sequences using InterProScan you first need to translate your nucleotide sequence using a tool such as Transeq, and then pick the relevant ORF translations for analysis. We recommend filtering the ORF translations by the following criteria:
1. Sequence length.
Short sequences (<80 aa) are unlikely to have any signature matches, so unless there is additional evidence that the sequence occurs, short sequences can be
discarded.
2. Significant hits from sequence similarity searches.
The signatures used by InterProScan are based on known protein sequences so a filtering step by performing a BLAST of FASTA sequence similarity search with the ORF translations against the UniProtKB or UniParc protein sequence databases and only keeping sequences which have hits with e-values <0.001. In the case where an exact match is found to the sequence, you can go directly to the InterPro Matches databases to get the signature matches for the sequence.
The standalone version of InterProScan can perform the translation and ORF length filtering as part of the submission and is recommended if you need to perform large numbers of analysis have access to the required resources.
Can you run InterProScan and choose which programs you wish to use (e.g. I don't want to run my sequence against Pfam). Will it affect my results?
If you remove any of the programs from InterProScan, that method will not be run on your sequence, meaning you will not get predictions for that method in your results. The methods are generally independent of each other and so those methods that remain should themselves produce the same results. There are a few ways to do this:
How can I retrieve my InterProScan results?
If you enter your email on the EBI web GUI for InterPro, you should receive an email with the results of you InterProScan submission. Alternatively, you can add the link displayed on the webpage whilst the job is running to your favourites ("bookmark it") and return to it later.
TopI've found that not all matches from .output files are parsed into .raw files. Are you using additional filtering?
Yes. InterProScan implements additional filtering for some of the databases. This is detailed in the README, in the "Results filtering/match status" section
I know that my protein is transmembrane|secreted and so should have a TM domain|signal peptide predicted but it doesn't. Why?
Unfortunately, this is a consequence of how InterProScan works. In order to save time, InterProScan calculates a checksum for your sequence and uses it to look-up pre-computed results in an XML file containing all the matches in the InterPro database. If the signature is not in the file for any reason (i.e. it is not stored by InterPro, or has not been integrated into the database yet), it will not be returned by the look-up, even if it would normally match your sequence if you ran a full search.
The TMHMM and SignalP prediction search algorithms are provided through the web interface at EBI (and as options for the stand-alone version, under license) for your convenience, however, they are not integrated into InterPro and as such, currently, do not exist in the pre-computed results file.
We are working on solutions to this problem, one of which is a new version of the match.xml file which will contain all integrated and unintegrated methods (although initially excluding TMHMM and SignalP predictions) for UniProt proteins. We intend to release this file every couple of weeks to ensure the most up-to-date information for our users.
If you are running the stand-alone version of InterProScan, you can avoid this problem by forcing a search every time you run it, by using the "-nocrc" option
Alternatively, for the web interface, run your SignalP/TMHMM analysis separately from the other applications to ensure that you find all matches.
I am getting different results to the website of a particular member database. Why?
- OR -
Results are missing for a particular HMM database in my stand-alone version of InterProScan but they appear on the EBI version
If a member database has just released a new version of its database, it will take a while for the new models to be integrated into InterPro. This means that for a short while, the results may differ slightly. If you wish, you can update the individual databases on your stand-alone version but be aware this may lead to confusing results, if any of the post-processing of the results has changed between versions.
You should also check that all your model files have been indexed correctly by index_data.pl: if InterProScan cannot find the binary file for an HMM database (usually created during installation) it will fail. You should have files called (e.g.) Gene3D.bin in your iprscan/data/ directory. Run index_data.pl thus:
iprscan/bin/index_data.pl -bin -v
(Also, see the answer above for another reason why results may differ.)
I get fewer results with BlastProDom using my local InterProScan installation than when I use EBI InterProScan. What's wrong?
Actually this is not an error. EBI InterProScan uses a larger file from ProDom (prodom.mul available for download from ProDom webiste) which contains all the ProDom entries. Standalone InterProScan uses a smaller file (prodom.ipr) which contains only ProDom entries integrated into InterPro. That explains why on EBI web site appear some entries marked as "unintegrated" which are not in prodom.ipr because they are not integrated into InterPro.
Also, if you use a different version of BLAST than the one provided with InterProScan (currently, 2.2.6), you may get different results reported.
What does the status flag mean?
InterProScan supports two flags:
"T" means that we believe it is a true positive match and
"F" (false) is created during post-processing but these are not reported in the final output.
As a result of manual curation InterPro supports more values for status. See: InterPro documentation.
The end position of the model is reported as being further than the length of the protein. Is this correct?
We have noticed that with some methods (FingerPrintScan and coils), the reported hit actually extends past the end of the protein.
e.g. Q25520 91CA44BBFB791E45 79 FPrintScan PR00967 ONCOGENEAML1 62 84
Probabilistically-speaking, this is actually OK, and the match is still true (it's just that there's no sequence there to match the rest of the model but the algorithm reports it anyway). We haven't seen this behaviour with any of the HMM-based methods.
So, whatever residue is reported as the "end" of the domain is usually part of the domain. However, if you see matches where the hit extends off the end of the sequence (i.e. the position of the end of the hit is reported as greater than the length of the protein), just ignore the reported "end" and use the last residue of the protein itself as the effective "end" of the domain.
Hopefully, this clarifies things, however, if you have further questions about this specific issue, please contact the authors of Coils (and FingerPrintScan).
TopCan you help me configure InterProScan so that it works on my particular OS?
We currently can only compile and test the different programs contained in the InterProScan package on the operating systems that are listed as supported (see README). We are happy to provide assistance to compile source code on other, unsupported operating systems whenever possible.
Unfortunately, the InterPro member databases generally do not provide binaries for Windows, meaning that you cannot install EBI InterProScan directly on a Windows server just yet.
However, CBSU at Cornell university have ported InterProScan to run on Windows. Their distribution is available via FTP. You will also need to additionally download perl and cygwin in order to run the software. Please direct any queries about using this version of InterProScan to their helpdesk at cbsu@tc.cornell.edu . Please also note that there is no guarantee that the version distributed for Windows is the latest, so always check the documentation beforehand to see if there are discrepancies between version numbers.
Can you help me configure InterProScan so that it works on my particular Queue System?
We currently only test InterProScan on LSF, however, we have provided configuration files for Sun Grid Engine 6 and PBS. It is possible we can provide assistance to help set up an unsupported queue system if necessary.
I have the Torque Resource Manager and InterProScan doesn't work properly. How can I try to fix this?
(Adapted from: http://wiki.nodalpoint.org/iprscan_hacks and http://freelancingscience.com/2008/07/10/configuring-torque-and-interproscan/)
Firstly, when you run Config.pl, answer yes when asked about using a batch queue system and yes to PBS54. Give your OpenPBS server name as a global resource and whichever queue you want to use as a global queue.
When finished, go into the iprscan/conf/ directory. Take a look at the conf files for each application. For instance in coils.conf, you should see lines that look like:
queue=pbs54 resource=foo (your PBS server name) queue.name=bar (default queue name)
Similar lines should appear in iprscan.conf. Modify pbs54.conf. Look for the lines that begin "asyncsub=qsub" and "syncsub=qsub".
asyncsub=qsub [%optqueue] [%optresource] -o /dev/null -e /dev/null "[%toolcmd]" syncsub=qsub [%optqueue] [%optresource] -o /dev/null -e /dev/null -I "[%toolcmd]"
Note first of all that in v4.3 of InterProScan and earlier, "-l" with no resource options was specified, which will cause PBS to fail with a 512 error code, so you should remove this if it exists. Second, if you compiled PBS to use scp for copying, you cannot scp to /dev/null on a remote machine. So change the lines to look like this:
asyncsub=qsub [%optqueue][%optresource] -j oe -o /tmp "[%toolcmd]" syncsub=qsub [%optqueue][%optresource] -j oe -o /tmp -I "[%toolcmd]"
Another problem is that you may find all your iprscan jobs are being directed to one node. Make sure the following attributes are set (in qmgr use 'list s' to view them):
resources_default.neednodes = 1 resources_default.nodect = 1 resources_default.nodes = 1 node_pack = False
Finally, assumming that Torque binaries are available in the global PATH (qsub, qdel etc., e.g. under /usr/local/bin), change the default shell in the enviroment file pbs54env.sh - from #!/bin/sh to #!/bin/bash. If wanted, you can also add another directories to the PATH by specifying in that file.
I want to have my data and/or tmp directory in a different location - can I do this?
Yes, simply move the directory where you want it to be and create a soft link to the new location from the interproscan home directory (iprscan).
Without having a queuing system is it possible to use several hosts to work on a batch of sequences submitted to iprscan?
Without a queuing system you can configure InterProScan to perform scanning of different methods on different hosts (1 scanning method per host) in parallel (which host is defined in the .conf file for each application). In the case of a queuing system you are asked for a submission host.
Do I have to start the jobs from the execution host or is it possible to start them from any host?
You can start it from any host which can do rsh to the "execution host". Check that you have access to each host you want to run the applications on using "rsh THE_HOST hostname" for example. You could be also asked to edit the file called .rhost on each host to allow connections from other machines.
(e.g. : to allow connection from foo.bar.com to blah.co.uk as user john, your .rhost on blah must contain something like : "foo.bar.com john").
I am using a queueing system, I looked at the configuration file but I don't understand what are the special tag "optqueue" and "optresource"
"optqueue" is the queue name to use for an application for example and optresource is the name of a resource to use for this application. If you don't know what these terms are, contact you system administrator.
Each time an application is launched, InterProScan reads its configuration file, and launches the job. To launch the job, it looks what is the method to do it. Queue or Local implementation? This is mentionned in the tag "queue" of each applications. Then, if the queue is a queue like LSF for example (lsf42.conf) then it launches the application using lsf command and specific queue name and resource on the command line if specified in the application configuration file (search for "resource" and "queue.name"). Thus in the command line of lsf, optqueue is replaced by the value of the "queue.name" tag (for this application) and optresource is then replaced by "resource" tag (for this application).
I am running SGE (Sun Grid Engine) and running a fairly large number of jobs. InterProScan just seems to hang without finishing. How can I fix it?
There are two possible reasons for the above happening.
1) The call to qstat in the bin/qstat_ipr.pl perl script assumes that all jobs from InterProScan will be reported in the output. If you are running a large number of jobs, and the 'finished_jobs' option in your installation is not set high enough, not all the finished jobs will be reported. This leads InterProScan to think it is still waiting for them.
You can either change the chunk size (defined in conf/iprscan.conf) so it is a higher number or change the finished_jobs option (configurable via SGE's qmon program) so that it reports more jobs.
-or-
2) It may be that your SGE is set up so that you cannot submit jobs from compute nodes. There is a main interproscan process ("iprscan") which launches all the searches and performs the processing of results. The problem with the default configuration is that it assumes that you are able to submit jobs from any node to any other node (so if the main process is launched on a node, that process will want to be able to submit jobs from that node to the cluster) and it isn't always the case that you have permissions to do that. Instead, edit the configuration file for the iprscan process ("iprscan.conf") so that it will always run on the machine that you normally submit jobs to the cluster from. Try changing the value of queue= to "local" instead of sge6 and see if that helps (You can leave host.exec blank).
TopHow do I change the url of the InterProScan server to my domain?
Go to your iprscan/conf directory and edit iprscan.conf "workserver" tag : Old value (http://fido.ebi.ac.uk:4000) to your domain (http://foo.bar.com).
workserver=http://fido.ebi.ac.uk:4000
becomes
workserver=http://foo.bar.com
You can precise a specific port to listen and also put a https url.
The images and logos are not displaying in the web interface
Make sure you put the correct path to your installation's image folder in the conf/iprscan.conf file so that it is visible from your webserver. All the necessary images are present in the images sub directory of the iprscan installaion.
TopI have got the smart thresholds files under license. How do I configure iprscan to use them?
You don't need to do much. Just rename THRESHOLDS to smart.thresholds and DESCRIPTIONS to smart.desc and put them in your data directory. Then, in conf/hmmsmart.conf, edit the line which starts "evalue=" and remove the e-value specified (but not the tag).
How can I plug in SignalP / TMHMM predictions?
The InterProScan package provides all required scripts/parsers for the methods but you have to contact the authors (software@cbs.dtu.dk) to get the programs and data since they are not publicly available.
Installation:
NOTE: SignalP version 2.0 and newer have limitations in the number of submitted sequences to 4000. If you don't want any restrictions you can try to hack the code by editing the signalp shell script. Search for:
# Maximal number of sequences (command line and WWW):
# Leave it empty of you don't want any limitations for the max number of input sequence (huge analysis).
# Default (4000)
#MAXSEQ=4000
MAXSEQ=
MAXWWWSEQ=4000
#We check if the $MAXSEQ is set. If not it means we don't want any limitations in the number
#of input sequences.
if [ "$MAXSEQ" != "" ]
then
if [ $NSEQ -gt $MAXSEQ ]
then
echo signalp: too many sequences, the limit is $MAXSEQ
exit 1
elif [ "$WWW" -a \( "$NSEQ" -gt "$MAXWWWSEQ" \) ]
then
cat $SIGNALP/doc/wwwtoomany.html | sed 's/_NUM_/'$MAXWWWSEQ'/'
exit
fi
fi
Finally, for both, edit the configuration file tags (signalp.conf, tmhmm.conf) to reflect your system. (queue : local/lsf42/pbs54/sge) and host.exec for local implementation OR queue.name and/or resource for queueing systems. To get an idea what this should look like, have a look at the other applications' configuration files.
Please note that you must choose whether or not you will use the NN (neural network) or HMM method of SignalP - you cannot run both together.
See: README for the relevant URLs and references.
Can I use more up to date source databases?
Yes. Just save the updated files under the same names and run index_data.pl manually. The only problem is that you will be getting more hits from signatures without corresponding InterPro records (referred as NULL as they aren't integrated yet).
Also, please note that you must index most of the HMM databases so they are converted to binary format by default. If you wish to avoid this, you must edit the .conf file for that database and remove the ".bin" from the database filename.
Can I use another sequence translation or translation tool than the ones provided?
Well, we integrated and developped new InterProScan using EMBOSS tools (two of them) because they are fast, robust, free, maintained and used by a lot of people. But it is up to you if you don't want to use them. You can use your own tools to reformat (not mandatory) or translate your sequences. If you don't want to reformat your sequences, you might have some problems with the headers of certain sequences and they could produce errors with certain applications.
Open iprscan/conf/iprscan.conf file. Search for "formatcmd". The original format sequence command is contained into a shell script calling seqret tool from EMBOSS package. This script reads the input sequences and write them to the InterProScan output sequence file. i.e.
formatcmd=[%env IPRSCAN_HOME]/conf/seqret.sh $in > $out
So, to replace it with your own formatting script (let's call it myscript), you can wrap it into a shell script (like we did for seqret - have a look at the shell script to see what we did) or have a simple script taking options. You can put it in the bin directory together with the other scripts (iprscan/bin). Alternatively, [%env IPRSCAN_HOME] can be replaced by another path where your script is located. [%env IPRSCAN_HOME] refers to the IPRSCAN_HOME environment variable, which is the path where InterProScan is installed.
formatcmd=[%env IPRSCAN_HOME]/[bin|conf]/myscript [options?] $in > $out
[options] : Is the eventual options your script could need to get the input sequence as a parameter (e.g. -i, -input , -seqfile .....).
NOTE: Leave "$in" as it is. It is converted by InterProScan by the real path of the input sequence file. So the command would be : myscript -i $in > $out.
Your script MUST be able to write the results on the standard output or to write the results into a specified file. BUT IN ALL CASES, you must leave "$out" as it is, the is the output file where the results will be, InterProScan will replace it by the right name of the file. So, your different case could be: myscript -i $in [ > $out | -o $out | -output $out | -out $out ...]
Open iprscan/conf/iprscan.conf file. Search for "translatecmd". The original translate sequences command is contained into a shell script calling sixpack tool form EMBOSS package. This script reads the input sequences and write the translated to the InterProScan translated output.
translatecmd=[%env IPRSCAN_HOME]/conf/sixpack.sh -table $table -orfminsize $trlen -outseq $out $in
So, to use your own translating tool, you can either wrap it with a shell script which will call the right options (in case in your script needs some) or just literaly write the whole command line. Your script can be installed either in conf or bin directory and should be executable. Also, [%env IPRSCAN_HOME] can be replaced by another path where your script is located. [%env IPRSCAN_HOME] refers to the IPRSCAN_HOME environment variable which is the path where InterProScan is installed.
translatecmd=[%env IPRSCAN_HOME]/[bin|conf]/myscript [options?]
[options] : Are the options your script might need to get the input sequence as a parameter (e.g. -i, -input , -seqfile .....).
NOTE: Leave "$in" and "$out" as they are. It is converted by InterProScan to the real path of the input and output file.
Additionally, you can specify (or not) a translation table (see http://www.ebi.ac.uk/cgi-bin/mutations/trtables.cgi) and also a minimum length for the translated sequence (-table and -orfminsize in our exmaple). BUT, if you have such option for table code value and minimum orf length, you will have to use $table and $trlen as value for the options of your scirpt as InterProScan will replace them automatically when reading the configuration file (e.g. : myscript -i $in -out $out -tablevalue $table -minlengthforORF $trlen).
If you have problems, contact interhelp@ebi.ac.uk
I would like to use more than one cpu for my hmmer searches using InterProScan. Is it possible to configure it?
Yes of course. Applications using hmmpfam, hmmscan or hmmsearch are configurable. You just need to update/change the tag "cpu_opt" in the applicaton's configuration file you want to update/change.
Configuration files supporting this option are listed below :
If this tag "cpu_opt" value is empty (default) the --cpu option is not used. NOTE: By default, PIR is set to --cpu 1.
TopI would like to avoid removing some of the session directories InterProScan created. Can I do it quickly?
Yes :) of course, by editing iprscan/conf/tooldefault.conf. Search for "dirmode" tag and put the dir permissions you want (default is 775). You can change the umask values as well.
I would like to have different rights on the session directory to avoid other people looking in it.
The rights for the date and session directories are stored in tooldefault.conf file under the "dirmode" tag. Default value for this tag is 777 and the umask is set to 000. So this means that anybody can creates/remove any directory under iprscan/tmp.
If you want to protect your session directory, open iprscan.conf, edit "usermode" and put the value you want. If not value is set, iprscan will use the default one stored in tooldefault.conf.
I would like to configure limits to enforce a maximum number of input sequences allowed to be given by the user. Is this possible?
Yes. You can do it when you install InterProScan or if you skipped it during installation you can do it manually editing iprscan.conf file. You can limit:
and also give the default value of the minimum length for an ORF ("minorfsize") when nucleic sequences are translated, thus that the default codon table value to use for translation ("codon.table").
I would like to apply a time limit to running jobs How can I do it?
It is quite simple. Edit iprscan.conf and put "job.time.limit" to 1. Then configure the two following tags, "pollinterval" (sleeping time in seconds between checking jobs) and maxpollrounds (number of times jobs are checked).
NOTE: BE AWARE THAT THIS CONFIGUATION IS NOT POSSIBLE WITH INSTALLTIONS USING "local" QUEUE!!! (MAY BE ADDED LATER).
Top I get the following errors running InterProScan on Ubuntu during the indexing step (which means that no binary HMM files are produced):
"Syntax error: Bad fd number"
Ubuntu has slightly different behaviour than other Linux flavours when it comes to shell scripts. By default, /bin/sh (the shell used in InterProScan) points to /bin/dash. In order to fix this error, you either need to change all scripts to explicitly use /bin/bash or alter /bin/sh so it is a symlink to /bin/bash.
After I have run Config.pl to install InterProScan, when I run the supplied test sequence, as suggested, I get the following errors:
"iprscan submission failed: checkSequences: Cannot get raw entry from iprmatches: getRawEntryFromIprMatches: query iprmatches failed: checkIndex: No such indexed file /<path>/<to>/iprscan/data/match_complete.xml.inx at /<path>/<to>/iprscan/lib/Dispatcher/Tool/InterProScan.pm line..."
This is probably because indexing of your match_complete.xml file has not occurred. This file is used to do a look-up of pre-computed results that have already been calculated for UniProt proteins (in case your sequence matches one of them). The reason for this feature is to speed up InterProScan run time, as it means that the computationally-expensive search steps can therefore be avoided.
It can be resolved in one of two ways:
1. Use the "-nocrc" option to by-pass the look-up and run the searches anyway (will therefore impact performance)
2. Re-index your match_complete.xml file by running:
/<path>/<to>/iprscan/bin/index_data.pl -f match_complete.xml -inx -iforce -vRun '/<path>/<to>/iprscan/bin/index_data.pl -h' for a full set of help options.
You can check the status of your file indexes (and whether you need to re-index) by running:
/<path>/<to>/iprscan/bin/wget.pl -cli -s
Note that you need quite a bit of memory to (currently, Feb-08) index this file
InterProScan will not run because Coils (ncoils) jobs print to stderr and fail
This error has been reported by a few of our users. As this is something we are unable to replicate on our test installation of InterProScan, we are unable to provide a good fix. However, a way around this is to redirect any errors (printed to stderr) from the program to /dev/null (the only problem being that this means that no errors will be noticed from ncoils, even when they are valid ones). Do this by editing the cmdline in your conf/ncoils.conf file.
I am having problem with FingerPRINTScan on my Linux.
Try changing the binaries to the correct one from ftp://bioinf.man.ac.uk/pub/fingerPRINTScan/binaries/Linux/ .
Make sure that the permissions on iprscan/bin/Linux/FingerPRINTScan are correct for execution, and if they are not, then do the command:
chmod 755 FingerPRINTScan
Make sure that this line in iprscan/conf/fprintscan.conf:
binary=[%env IPRSCAN_HOME]/bin/binaries/fingerPRINTScan
refers to the same binary file name that appears at iprscan/bin/Linux/FingerPRINTScan
I am getting messages like: "Can't locate loadable object for module DB_File in @INC..." or "Can't locate > auto/DB_File/autosplit.ix in @INC..." or "your libdb and db.h file are not compatible"
Check your installation of perl. You must ensure that it has all the necessary file modules (e.g. DB_File.pm & BerkeleyDB) and that dynaloader actually picks them up.
I get the error "Inappropriate ioctl for device" or "Traceback failed" when running HMMER-based applications. What is going wrong?
You need to download and recompile the HMMER 2.3.2 binaries on your system. The source code can be downloaded from http://hmmer.wustl.edu/.
Do the following:
cd /path/to/ mkdir HMMER cd HMMER mkdir bin mkdir man ftp # get ftp://ftp.genetics.wustl.edu/pub/eddy/hmmer/2.3.2/hmmer-2.3.2.tar.gz gunzip hmmer-2.3.2.tar.gz tar xf hmmer-2.3.2.tar cd hmmer-2.3.2 ./configure --enable-threads --mandir=/path/to/HMMER/man/ --bindir=/path/to/HMMER/bin/ make make check make install cd /path/to/iprscan/bin/binaries/ cp /path/to/HMMER/hmmer-2.3.2/src/hmmpfam . cp /path/to/HMMER/hmmer-2.3.2/src/hmmsearch . cp /path/to/HMMER/hmmer-2.3.2/src/hmmconvert .
This should fix both problems. N.B. --enable-threads is not supported on all systems. If this is the case, you need to edit all .conf files so that you do not specify cpu_opt. You will also need to edit iprscan/bin/superfamily.pl so that the number of CPUs is not hardcoded.
The index_data.pl gives errors when it is running. What should I do to stop it doing this?
Errors can be for various reasons:
I have installed the EMBOSS package and also seqret and sixpack but I get errors from InterProScan about them. What is the problem?
Check the environment variable called EMBOSS_ROOT and EMBOSS_ACDROOT in the seqret and sixpack shell scripts located in your iprscan/conf directory. Make sure that these values point to the right directory. EMBOSS_ROOT is the root directory where your EMBOSS package is installed and EMBOSS_ACDROOT is the directory where acd directory (needed for all EMBOSS applications) is needed.
InterProScan gives me a report file containing some errors from FingerPRINTScan that are weird like : ERROR: Calculation has exceeded maximum allowed complexity Fingerprint PRICHEXTENSN matches this sequence..
This is not a real error, this is just a warning. Don't worry about it.
I get the following error: "supervise: doRawResults: failed to create raw result: Parsing Problem for Panther with location "
This error is usually caused when the Panther data has not been installed correctly. Please make sure you have downloaded the PANTHER-specific file from the DATA directory on the FTP site. Untar and unzip it as you would during install. This should fix the problem
Alternatively, it may be because a temporary directory used by Panther during processing does not exist. Check that you have the directory iprscan/tmp/tmp under your installation.
I am running Solaris and during InterProScan install, the following error is seen: "bin/index_data.pl -bin sh: /dev/null: bad number ERROR: Problem during the conversion of file /[myinstalldir]/Pfam : No such file or directory "
During install, the configuration script now runs all file indexing (so that you don't need to download the indices - you can build them yourself). On some versions of solaris, this call to index_data.pl causes InterProScan to crash. You can fix it by changing the following:
if(system(\"$path/bin/binaries/hmmconvert -b $f $f.bin >& /dev/null\")){
to:
if(system(\"$path/bin/binaries/hmmconvert -b $f $f.bin\")){
I get the following error when trying to run TMHMM: "Errors : cat: output error (0/218 characters written) Broken pipe"
This is an error from TMHMM directly. Check that the model you have in your directory and the model specified in iprscan/conf/tmhmm.conf have the same name. If not, edit the .conf file
I get an error when running one of the HMM-based applications (e.g. hmmpfam): "FATAL: you can't init get_wee_midpt with a T"
This is an error from HMMer 2.3.2 and is a rare exception that fortunately can be fixed - although no patch exists yet.
Sean Eddy fix:
If you have plenty of memory, you can work around it just by setting RAMLIMIT to a larger number in config.h and recompiling everything ('make clean; ./configure; make'). It's set up to guarantee <32 MB devoted to alignment algorithms, and that's pretty small on most modern machines. You could set RAMLIMIT to 1000 if you have a GB, and the problem (which is in the small-memory fallback routines) probably wouldn't appear.
Alternatively, if you're up for fixing it yourself: go into core_algorithms.c:7SmallViterbi(), and go to lines 992-1003, where you'll see this block of code:
if (P7ViterbiSpaceOK(sqlen, hmm->M, mx))
{
SQD_DPRINTF1((" -- using P7Viterbi on an %dx%d subproblem\n",
hmm->M, sqlen));
P7Viterbi(dsq + ctr->pos[i*2+1], sqlen, hmm, mx, &(tarr[i]));
}
else
{
SQD_DPRINTF1((" -- using P7WeeViterbi on an %dx%d subproblem\n",
hmm->M, sqlen));
P7WeeViterbi(dsq + ctr->pos[i*2+1], sqlen, hmm, &(tarr[i]));
}
Add an extra chunk, testing for sqlen==1, to change that to:
if (P7ViterbiSpaceOK(sqlen, hmm->M, mx))
{
SQD_DPRINTF1((" -- using P7Viterbi on an %dx%d subproblem\n",
hmm->M, sqlen));
P7Viterbi(dsq + ctr->pos[i*2+1], sqlen, hmm, mx, &(tarr[i]));
}
else if (sqlen == 1)
{ /* xref bug#h30. P7WeeViterbi() can't take L=1. This
is a hack to work around the problem, which is rare.
Attempts to use our main dp mx will violate our
RAMLIMIT guarantee, so allocate a tiny linear one. */
struct dpmatrix_s *tiny;
SQD_DPRINTF1((" -- using P7Viterbi on %dx%d subproblem that P7WeeV should get\n",
hmm->M, sqlen));
tiny = CreatePlan7Matrix(1, hmm->M, 0, 0);
P7Viterbi(dsq + ctr->pos[i*2+1], sqlen, hmm, tiny, &(tarr[i]));
FreePlan7Matrix(tiny);
}
else
{
SQD_DPRINTF1((" -- using P7WeeViterbi on an %dx%d subproblem\n",
hmm->M, sqlen));
P7WeeViterbi(dsq + ctr->pos[i*2+1], sqlen, hmm, &(tarr[i]));
}
Then recompile.
Top
My searches are taking a really long time. Any ideas how I can improve the speed?
Try experimenting with the "chunk" size setting so that you minimise the number of jobs created by InterProScan and the memory footprint of the program.
Top