Skip to content

Commit f68ff34

Browse files
committed
Updating code base to match latest version 1.1.6 on PyPi.
1 parent dd25dae commit f68ff34

16 files changed

+292
-24
lines changed

CHANGELOG

+2
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,5 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
2525
- Added CategoryNode object on SubGraph which replaces much of the functionality of the previous version's SubGraph.representative_node data member.
2626
- CategoryNode contains a list called representative_nodes, which allows the subgraph to be rooted to multiple nodes if needed.
2727

28+
## [1.1.6] - 2019-11-25
29+
- Fixed an issue related to setting max field_size_limit for CSV and creating OverflowErrors on some systems.

LICENSE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
A modified Clear BSD License
1+
The Clear BSD License
22

33
Copyright (c) 2017, Eugene W. Hinderer III, Hunter N.B. Moseley
44
All rights reserved.

README.md

+171
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# GOcats
2+
3+
GOcats is an Open Biomedical Ontology (OBO) parser and categorizing utility--currently specialized for the Gene Ontology (GO)--which can sort ontology terms into conceptual categories that a user provides.
4+
5+
## Important note: cite your use of GOcats
6+
See the CITATION file for instructions.
7+
8+
## Getting Started
9+
10+
It is recommended that you clone this repository into a project directory within the home directory.
11+
12+
You will also need a local copy of the Gene Ontology OBO flat file, available here: http://purl.obolibrary.org/obo/go.obo
13+
14+
GOcats is able to map annotations within Gene Association Files (GAFs) into categories specified by the user. These categories are specified by creating a csv file where column 1 is the name of the category and column 2 is a list of keywords associated with that category concept, separated by semicolons (;). See GOcats/gocats/exampledata/examplecategories.csv as an example of 25 subcellular location categories. In its current version, this will be the main use of GOcats.
15+
16+
If you would like to perform the analyses carried out in the development of GOcats which involve mapping comparisons to OWLTools' Map2Slim and to UniProt's Subcellular Location Controlled Vocabulary, please install the "Additional Packages" listed under the Prerequisites section and see the Running the Tests section.
17+
18+
### Prerequisites
19+
20+
#### Generating GOcats category mapping and mapping GAFs (standard usage)
21+
22+
##### Python3 / pip
23+
24+
Fedora 24
25+
```
26+
sudo dnf install python3-devel
27+
sudo dnf install python3-pip
28+
```
29+
30+
Ubuntu 16.04
31+
```
32+
sudo apt-get install python3-dev
33+
sudo apt-get install python3-pip
34+
```
35+
36+
##### Docopt / JSONPickle
37+
38+
Fedora 24 / Ubuntu 16.04
39+
```
40+
sudo pip3 install docopt
41+
sudo pip3 install jsonpickle
42+
```
43+
44+
#### Additional Packages (for running development tests, and scripts for producing manuscript results)
45+
46+
##### OWLTools prerequisites (see Installing OWLTools under Installing or visit https://github.com/owlcollab/owltools):
47+
48+
###### Maven / Java
49+
50+
Fedora 24
51+
```
52+
sudo dnf install maven java-1.8.0-openjdk-devel
53+
```
54+
55+
Ubuntu
56+
```
57+
sudo apt-get install maven openjdk-8-jdk
58+
```
59+
60+
#### Plotting figures and tables
61+
62+
Fedora 24
63+
```
64+
sudo dnf install gcc-c++ libpng-devel freetype-devel libffi-devel python3-tkinter
65+
sudo pip3 install --upgrade pip
66+
sudo pip3 install numpy pandas tabulate cairocffi pyupset py2cytoscape matplotlib
67+
```
68+
Ubuntu 16.04
69+
```
70+
sudo apt get install gcc libpng-dev freetype2-demos libffi-dev python3-tk
71+
sudo pip3 install --upgrade pip
72+
sudo pip3 install numpy pandas tabulate cairocffi pyupset py2cytoscape matplotlib
73+
```
74+
75+
### Installing
76+
77+
#### GOcats
78+
79+
Clone the repo after installing the dependencies (you will need permission to
80+
access the gitlab server. If you do not have access, you probably got this
81+
project directory from FigShare, in which case these steps are unnecessary).
82+
```
83+
cd
84+
git clone https://[email protected]/eugene/GOcats.git
85+
```
86+
87+
Checkout the manuscript_3 branch for the most recent version
88+
```
89+
cd GOcats
90+
git fetch
91+
git checkout manuscript_3
92+
```
93+
94+
#### OWLTools (optional)
95+
96+
Clone the repo after installing the dependencies
97+
```
98+
cd
99+
git clone https://github.com/owlcollab/owltools
100+
```
101+
102+
Install owltools using maven
103+
```
104+
cd ~/owltools/OWLTools-Parent
105+
mvn clean package
106+
```
107+
108+
You may get build errors. If this happens, I found that this command gets around them without affecting the usage in this project
109+
```
110+
mvn clean package -D maven.test.skip.exec=true
111+
```
112+
113+
#### Example usage
114+
115+
Creating a mapping of GO terms from the Gene Ontology using a category file
116+
```
117+
python3 ~/GOcats/gocats/gocats.py create_subgraphs /path_to_ontology_file ~/ARK.GOcats/gocats/exampledata/examplecategories.csv ~/Output --supergraph_namespace=cellular_component --subgraph_namespace=cellular_component --output_termlist
118+
```
119+
This will output several files in the 'Output' directory including:
120+
```
121+
GC_content_mapping.json_pickle # A python dictionary with category-defining GO terms as keys and a list of all subgraph contents as values.
122+
GC_id_mapping.json_pickle # A python dictionary with every GO term of the specified namespace as keys and a list of category root terms as values.
123+
```
124+
125+
Mapping GO terms in a GAF
126+
```
127+
python3 ~/GOcats/gocats/gocats.py categorize_dataset YOUR_GAF.goa YOUR_OUTPUT_DIRECTORY/GC_id_mapping.json_pickle YOUR_OUTPUT_DIRECTORY MAPPED_GAF_NAME.goa
128+
```
129+
130+
## Running the tests and producing manuscript results
131+
132+
##### The following run scripts are located in GOcats/runscripts. See doc strings in each script for information on how to run each. NOTE: All prerequisites must be installed before running the following scripts. Make sure to check each script to ensure that the installation path to OWLTools is correct.
133+
134+
**run.sh** - This script runs all figure and table-producing scripts in the GOcats/runscripts directory and places output tables, figures and data in the specified <output_dir>.
135+
136+
**GenerateHindererCategories.sh** - Used to produce S1 and Tables 1, and 2. This script produces inclusion index values, Jaccard index values, and other information for the example subgraph categories described by Hinderer and Moseley.
137+
138+
**GenerateHPAMappingComparison.sh** - Used to produce Figures 7a, and 8a and Tables 6 and 8. Note: Requires OWLTools-map2slim OWLTools available here: https://github.com/owlcollab/owltools/wiki/Map2Slim Assuming OwlTools is installed under ~$HOME/owltools If not, edit OWLTOOLS_DIR to the appropriate directory.
139+
140+
**GenerateGenericHPAMappingComparison.sh** - Used to produce Figures 7b and 8b. This script produces knowledgebase mappings from the HPA raw data and from the knowledgebases to a set of categories representing a more generic version of HPA's localization annotations. These were chosen by Hinderer and Moseley to resolve discrepancies in term granularity observed between knowledgebase annotations and experimental data annotations.
141+
142+
**GenerateVisualizationData.sh** - Used to produce data for Figure 3a-c. Network tables produced by this script can be loded into Cytoscape for network visualization. GOcats/runscripts/run.sh can automatically load up and format the Cytoscape networks if an active Cytoscape session is opened to port 1234. To do this, navigate to your Cytoscape directory and run the following before executing run.sh: sh cytoscape.sh -R 1234
143+
144+
**SpeedTest.sh** - Used to report speed comparisons between GOcats and Map2Slim.
145+
146+
##### The following test and supporting scripts are located in GOcats/gocats:
147+
148+
**hpmappingtesting.py** - Produces the data for Table 4.
149+
150+
**gofull.py** - Used to gather graph information across all of GO or specific sections of GO. Specifically used to gather information about the number of each relation in GO.
151+
152+
**plotfigures.py** - Creates figures 7a-b and 8a-b from the data produced from other run scripts. Be sure to run all run scripts and note the
153+
location of the output directories before running this script.
154+
155+
**cytoscapegraph.py** - Loads and automatically formats visualization data produced by GenerateVisualizationData.sh in an active Cytoscape session. See comments in GOcats/runscripts/run.sh for more information.
156+
157+
**testfindancestors.py** - Creates ancestor lists of GO terms from annotations in a Gene Annotation File using several methods of ancestor finding.
158+
159+
##### The following run scripts are located in GOcats/gocats/tests/Map2SlimMappingTest:
160+
161+
**run.sh** - Produces the data used in Table 5. Once run, the results are stored in GOcats/gocats/tests/Map2SlimMappingTest/logs. NOTE! These scripts contain custom commands for a TORQUE cluster that can only be run in-house and are thus not reproducible outside of our lab. Contact corresponding author for questions.
162+
163+
##### Other results:
164+
165+
Information for Table 7 was entered manually to describe how the custom generic categories encompassed the previously-used categories.
166+
167+
Information for Table 9 was compiled using the build_graph_interpreter command in gocats.py for each constraint (all GO, cellular_component, molecular_function, and biological_process) and accessing the graph object's 'relationship_count' variable to tally the use of each relationship type. The rest of the information was entered manually.
168+
169+
## Authors
170+
171+
* **Eugene Hinderer** - [ehinderer](https://github.com/ehinderer)

README.rst

+36-11
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,16 @@
11
GOcats
22
======
33

4-
.. image:: https://raw.githubusercontent.com/MoseleyBioinformaticsLab/GOcats/master/doc/_static/images/GOcats_logo.png
4+
.. image:: _static/images/GOcats_logo.png
55
:width: 50%
6-
:align: right
6+
:align: center
77
:target: http://gocats.readthedocs.io/
88

99
`GOcats` is an Open Biomedical Ontology (OBO) parser and categorizing utility--currently specialized for the Gene
1010
Ontology (GO)--which can help scientists interpret large-scale experimental results by organizing redundant and highly-
1111
specific annotations into customizable, biologically-relevant concept categories. Concept subgraphs are defined by lists
1212
of keywords created by the user.
1313

14-
Full API documentation, userguide, and tutorial can be found on readthedocs_
15-
1614
Currently, the `GOcats` package can be used to:
1715
* Create subgraphs of GO which each represent a user-specified concept.
1816
* Map specific, or fine-grained, GO terms in a Gene Annotation File (GAF) to an arbitrary number of concept
@@ -57,7 +55,7 @@ Dependencies
5755
`GOcats` requires the following Python libraries:
5856

5957
* docopt_ for creating the :mod:`gocats` command-line interface.
60-
* jsonpickle_ for saving Python objects in a JSON serializable form and outputting to a file.
58+
* JSONPickle_ for saving Python objects in a JSON serializable form and outputting to a file.
6159

6260
To install dependencies manually:
6361

@@ -96,20 +94,47 @@ GAF mappings can also be made from the command line:
9694
License
9795
~~~~~~~
9896

99-
.. include:: ../LICENSE
97+
Made available under the terms of The Clear BSD License. See full license in LICENSE.
98+
99+
The Clear BSD License
100+
101+
Copyright (c) 2017, Eugene W. Hinderer III, Hunter N.B. Moseley
102+
All rights reserved.
103+
104+
Redistribution and use in source and binary forms, with or without
105+
modification, are permitted (subject to the limitations in the disclaimer
106+
below) provided that the following conditions are met:
107+
108+
* Redistributions of source code must retain the above copyright notice, this
109+
list of conditions and the following disclaimer.
110+
111+
* Redistributions in binary form must reproduce the above copyright notice,
112+
this list of conditions and the following disclaimer in the documentation
113+
and/or other materials provided with the distribution.
114+
115+
* Neither the name of the copyright holder nor the names of its contributors may be used
116+
to endorse or promote products derived from this software without specific
117+
prior written permission.
100118

101-
Made available under the terms of The Clear BSD License. See full license in LICENSE_.
119+
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS
120+
LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
121+
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
122+
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
123+
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
124+
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
125+
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
126+
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
127+
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
128+
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
129+
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
130+
DAMAGE.
102131

103132
Authors
104133
~~~~~~~
105134

106135
* **Eugene W. Hinderer III** - ehinderer_
107136
* **Hunter N.B. Moseley** - hunter-moseley_
108137

109-
.. _readthedocs: http://gocats.readthedocs.io/en/latest/
110-
.. _docopt: https://github.com/docopt/docopt
111-
.. _jsonpickle: https://jsonpickle.github.io/
112138
.. _git: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git/
113139
.. _ehinderer: https://github.com/ehinderer
114140
.. _hunter-moseley: https://github.com/hunter-moseley
115-
.. _LICENSE: https://gocats.readthedocs.io/en/latest/#license

doc/_build/html/_modules/gocats/gocats.html

+6-6
Original file line numberDiff line numberDiff line change
@@ -268,17 +268,17 @@ <h1>Source code for gocats.gocats</h1><div class="highlight"><pre>
268268
<span class="sd">&quot;&quot;&quot;Reads in a Gene Annotation File (GAF) and maps the annotations contained therein to the categories organized by</span>
269269
<span class="sd"> GOcats or other methods. Outputs a mapped GAF and a list of unmapped genes in the specified output directory.</span>
270270

271-
<span class="sd"> :param gaf_dataset: A Gene Annotation File.</span>
271+
<span class="sd"> :param dataset_file: A Gene Annotation File.</span>
272272
<span class="sd"> :param term_mapping: A dictionary mapping category-defining ontology terms to their subgraph children terms. May be produced by GOcats or another method.</span>
273273
<span class="sd"> :param output_directory: Specify the directory where the output file will be stored.</span>
274-
<span class="sd"> :param GAF_name: Specify the desired name of the mapped GAF.</span>
274+
<span class="sd"> :param mapped_dataset_filename: Specify the desired name of the mapped GAF.</span>
275275
<span class="sd"> :return: None</span>
276276
<span class="sd"> :rtype: :py:obj:`None`</span>
277277
<span class="sd"> &quot;&quot;&quot;</span>
278-
<span class="n">loaded_gaf_array</span> <span class="o">=</span> <span class="n">tools</span><span class="o">.</span><span class="n">parse_gaf</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="s1">&#39;&lt;gaf_dataset&gt;&#39;</span><span class="p">])</span>
278+
<span class="n">loaded_gaf_array</span> <span class="o">=</span> <span class="n">tools</span><span class="o">.</span><span class="n">parse_gaf</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="s1">&#39;&lt;dataset_file&gt;&#39;</span><span class="p">])</span>
279279
<span class="n">mapping_dict</span> <span class="o">=</span> <span class="n">tools</span><span class="o">.</span><span class="n">jsonpickle_load</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="s1">&#39;&lt;term_mapping&gt;&#39;</span><span class="p">])</span>
280280
<span class="n">output_directory</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">realpath</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="s1">&#39;&lt;output_directory&gt;&#39;</span><span class="p">])</span>
281-
<span class="n">gaf_name</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="s1">&#39;&lt;GAF_name&gt;&#39;</span><span class="p">]</span>
281+
<span class="n">mapped_dataset_filename</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="s1">&#39;&lt;mapped_dataset_filename&gt;&#39;</span><span class="p">]</span>
282282
<span class="n">mapped_gaf_array</span> <span class="o">=</span> <span class="nb">list</span><span class="p">()</span>
283283
<span class="n">unmapped_genes</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
284284

@@ -294,8 +294,8 @@ <h1>Source code for gocats.gocats</h1><div class="highlight"><pre>
294294
<span class="n">unmapped_genes</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s1">&#39;NO_GENE:&#39;</span> <span class="o">+</span> <span class="n">line</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
295295
<span class="k">else</span><span class="p">:</span>
296296
<span class="n">unmapped_genes</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">line</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
297-
<span class="n">tools</span><span class="o">.</span><span class="n">write_out_gaf</span><span class="p">(</span><span class="n">mapped_gaf_array</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">output_directory</span><span class="p">,</span> <span class="n">gaf_name</span><span class="p">))</span>
298-
<span class="n">tools</span><span class="o">.</span><span class="n">list_to_file</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">output_directory</span><span class="p">,</span> <span class="n">gaf_name</span> <span class="o">+</span> <span class="s1">&#39;_unmappedGenes&#39;</span><span class="p">),</span> <span class="n">unmapped_genes</span><span class="p">)</span></div>
297+
<span class="n">tools</span><span class="o">.</span><span class="n">write_out_gaf</span><span class="p">(</span><span class="n">mapped_gaf_array</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">output_directory</span><span class="p">,</span> <span class="n">mapped_dataset_filename</span><span class="p">))</span>
298+
<span class="n">tools</span><span class="o">.</span><span class="n">list_to_file</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">output_directory</span><span class="p">,</span> <span class="n">mapped_dataset_filename</span> <span class="o">+</span> <span class="s1">&#39;_unmappedGenes&#39;</span><span class="p">),</span> <span class="n">unmapped_genes</span><span class="p">)</span></div>
299299
</pre></div>
300300

301301
</div>

doc/_build/html/api.html

+2-2
Original file line numberDiff line numberDiff line change
@@ -97,10 +97,10 @@ <h2>The Gene Ontology Categories Suite (GOcats)<a class="headerlink" href="#the-
9797
<col class="field-body" />
9898
<tbody valign="top">
9999
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
100-
<li><strong>gaf_dataset</strong> &#8211; A Gene Annotation File.</li>
100+
<li><strong>dataset_file</strong> &#8211; A Gene Annotation File.</li>
101101
<li><strong>term_mapping</strong> &#8211; A dictionary mapping category-defining ontology terms to their subgraph children terms. May be produced by GOcats or another method.</li>
102102
<li><strong>output_directory</strong> &#8211; Specify the directory where the output file will be stored.</li>
103-
<li><strong>GAF_name</strong> &#8211; Specify the desired name of the mapped GAF.</li>
103+
<li><strong>mapped_dataset_filename</strong> &#8211; Specify the desired name of the mapped GAF.</li>
104104
</ul>
105105
</td>
106106
</tr>

0 commit comments

Comments
 (0)