Skip to content

Commit 4dd5b77

Browse files
committedFeb 10, 2016
updated READMEs
1 parent 71ea5bf commit 4dd5b77

File tree

2 files changed

+180
-138
lines changed

2 files changed

+180
-138
lines changed
 

‎README.md

+25-138
Original file line numberDiff line numberDiff line change
@@ -1,155 +1,42 @@
1-
# Wikidata-Toolkit-Examples
1+
# Wikidata Toolkit Examples
22

3-
This repository contains example programs that show some of the features
4-
of Wikidata Toolkit.
3+
This is an example project that shows how to set up a Java project that
4+
uses [Wikidata Toolkit](https://github.com/Wikidata/Wikidata-Toolkit).
5+
It contains several simple example programs and bots in the source directory.
56

6-
Overview and Settings
7-
---------------------
8-
9-
A detailed guide to each of the examples is given below. Many examples process data
10-
dumps exported by Wikidata. In most cases, the example only contains the actual
11-
processing code that does something interesting. The code for downloading dumps and
12-
iterating over them is in the ExampleHelpers.java class, which is used in many examples
13-
for common tasks.
14-
15-
You can edit the static members in ExampleHelpers to select which dumps should be
16-
used (the data is available in several formats which may be more or less recent
17-
and more or less comprehensive). You can also switch to offline mode there: then
18-
only the files downloaded previously will be used. This is convenient for testing
19-
to avoid downloading new files when you don't really need absolutely current data.
20-
By default, the code will fetch the most recent JSON dumps from the Web.
21-
22-
Some examples write their output to files. These files are put into the subdirectory
23-
"results" under the directory from where the application is run. Files in CSV
24-
format can be loaded in any spreadsheet tool to make diagrams, for example.
25-
26-
Guide to the Available Examples
7+
What's found in this repository
278
-------------------------------
289

29-
Ordered roughly from basic to advanced/specific.
30-
31-
#### EntityStatisticsProcessor.java ####
32-
33-
This program processes entities (items and properties) to collect some basic
34-
statistics. It counts how many items and properties there are, the number of labels,
35-
descriptions, and aliases, and the number of statements. This code might be useful
36-
to get to know the basic data structures where these things are stored. The example
37-
also counts the usage of each property in more details: its use in the main part
38-
of statements, in qualifiers, and in references is counted separately. The results
39-
for this are written into a CSV file in the end.
40-
41-
#### FetchOnlineDataExample.java ####
42-
43-
This program shows how to fetch live data from wikidata.org via the Web API. This can
44-
be used with any other Wikibase site as well. It is not practical to fetch all data
45-
in this way, but it can be very convenient to get some data directly even when processing
46-
a dump (since the dump can only be read in sequence).
47-
48-
#### EditOnlineDataExample.java ####
49-
50-
This program shows how to create and modify live data on test.wikidata.org via the Web API.
51-
This can be used with any other Wikibase site as well. The example first creates a new item
52-
with some starting data, then adds some additional statements, and finally modifies and
53-
deletes existing statements. All data modifications automatically use revision ids to make
54-
sure that no edit conflicts occur (and we don't modify/delete data that is different from
55-
what we expect).
56-
57-
#### LocalDumpFileExample.java ####
58-
59-
This program shows how to process a data dump that is available in a local file, rather
60-
than being automatically downloaded (and possibly cached) from the Wikimedia site.
61-
62-
#### GreatestNumberProcessor.java ####
63-
64-
This simple program looks at all values of a number property to find the item with the
65-
greatest value. It will print the result to the console. In most cases, the item with
66-
the greatest number is fairly early in the data export, so watching the program work is
67-
not too exciting, but it shows how to read a single property value to do something with
68-
it. The property that is used is defined by a constant in the code and can be changed to
69-
see some other greatest values.
70-
71-
#### LifeExpectancyProcessor.java ####
72-
73-
This program processes items to compute the average life expectancy of people on
74-
Wikidata. It shows how to get details (here: year numbers) of specific statement values
75-
for specific properties (here we use Wikidata's P569 "birth date" and P570 "death date").
76-
The results are stored in a CSV file that shows average life expectancy by year of
77-
birth. The overall average is also printed to the output.
78-
79-
#### WorldMapProcessor.java ####
80-
81-
This program generates images of world maps based on the locations of Wikidata items,
82-
and stores the result in PNG files. The example builds several maps, for Wikidata as
83-
a whole and for several big Wikipedias (counting only items with an article in there).
84-
The code offers easy-to-adjust parameters for the size of the output images, the
85-
Wikimedia projects to consider, and the scale of the color values.
86-
87-
[Wikidata world maps for June 2015](https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en)
88-
89-
#### GenderRatioProcessor.java ####
90-
91-
This program uses Wikidata to analyse the number of articles that exist on certain
92-
topics in different Wikimedia projects (esp. in Wikipedias). In particular, it counts
93-
the number of articles about humans and humans of a specific gender (female, male, etc.).
94-
Can be used to estimate the gender balance of various Wikipedias. The results are stored
95-
in a CSV file (all projects x all genders), but for the largest projects they are also
96-
printed to the output. This example is inspired by Max Klein's work on this topic.
97-
98-
[Related blog post by Max Klein](http://notconfusing.com/sex-ratios-in-wikidata-part-iii/)
99-
100-
#### JsonSerializationProcessor.java ####
101-
102-
This program creates a JSON file that contains English language terms, birthdate, occupation,
103-
and image for all people on Wikidata who were born in Dresden (the code can easily be
104-
modified to make a different selection). The example shows how to serialize Wikidata Toolkit
105-
objects in JSON, how to select item documents by a property, and how to filter documents to
106-
ignore some of the data. The resulting file is small (less than 1M).
107-
108-
#### SitelinksExample.java ####
10+
The individual examples are documented in the README file of each package.
10911

110-
This program shows how to get information about the site links that are used in Wikidata
111-
dumps. The links to Wikimedia projects use keys like "enwiki" for English Wikipedia or
112-
"hewikivoyage" for Hebrew WikiVoyage. To find out the meaning of these codes, and to
113-
create URLs for the articles on these projects, Wikidata Toolkit includes some simple
114-
functions that download and process the site links information for a given project.
115-
This example shows how to use this functionality.
11612

117-
#### ClassPropertyUsageExample.java ####
13+
Running examples using an IDE
14+
-----------------------------
11815

119-
This advanced program analyses the use of properties and classes on Wikidata, and creates
120-
output that can be used in the [Miga data browser](http://migadv.com/). You can see the
121-
result online at http://tools.wmflabs.org/wikidata-exports/miga/. The program is slightly
122-
more complex, involving several processing steps and additional code for formatting output
123-
for CSV files.
16+
You can import the project into any Java IDE that supports Maven (and maybe git)
17+
and run the example programs from there. Wikidata Toolkit provides detailed
18+
[instructions on how to set up Eclipse for using Maven and git](https://www.mediawiki.org/wiki/Wikidata_Toolkit/Eclipse_setup).
12419

125-
#### RdfSerializationExample.java ####
12620

127-
This program creates an RDF export. You can also do this directly using the command line
128-
client. The purpose of this program is just to show how this could be done in code, e.g.,
129-
to implement additional pre-processing before the RDF serialisation.
21+
Running examples directly using Maven
22+
-------------------------------------
13023

24+
You can also run the code directly using Maven from the command line. For this,
25+
you need to have Maven and (obviously) Java installed. To compile the project
26+
and obtain necessary dependencies, run
13127

132-
Other Helper Code
133-
-----------------
28+
```mvn compile```
13429

135-
#### ExampleHelpers.java ####
30+
Thereafter, you can run any individual example using its Java class name, for
31+
example:
13632

137-
This class provides static helper methods to iterate through dumps, to configure the
138-
desired logging behaviour, and to write files to the "results" directory. It also allows
139-
you to change some global settings that will affect most examples. The code is of interest
140-
if you want to find out how to build a standalone application that includes all aspects
141-
without relying on the example module.
33+
```mvn exec:java -Dexec.mainClass="examples.FetchOnlineDataExample"```
14234

143-
#### EntityTimerProcessor.java ####
35+
Credits and License
36+
-------------------
14437

145-
This is a helper class that is used in all examples to print basic timer information and
146-
to provide support for having a timeout (cleanly abort processing after a fixed time, even
147-
if the dump would take much longer to complete; useful for testing). It should not be of
148-
primary interest for learning how to use Wikidata Toolkit, but you can have a look to find
149-
out how to use our Timer class.
38+
This project is copied from the [Wikidata Toolkit](https://github.com/Wikidata/Wikidata-Toolkit) examples module.
39+
Authors can be found there.
15040

151-
Additional Resources
152-
--------------------
41+
License: [Apache 2.0](LICENSE)
15342

154-
* [Wikidata Toolkit homepage](https://www.mediawiki.org/wiki/Wikidata_Toolkit)
155-
* [Wikidata Toolkit Javadocs](http://wikidata.github.io/Wikidata-Toolkit/)

‎src/examples/README.md

+155
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# Example Package
2+
3+
This package contains stand-alone Java example programs. Some standard functions
4+
are in the file ExampleHelpers. You can copy from there for your own projects.
5+
6+
Overview and Settings
7+
---------------------
8+
9+
A detailed guide to each of the examples is given below. Many examples process data
10+
dumps exported by Wikidata. In most cases, the example only contains the actual
11+
processing code that does something interesting. The code for downloading dumps and
12+
iterating over them is in the ExampleHelpers.java class, which is used in many examples
13+
for common tasks.
14+
15+
You can edit the static members in ExampleHelpers to select which dumps should be
16+
used (the data is available in several formats which may be more or less recent
17+
and more or less comprehensive). You can also switch to offline mode there: then
18+
only the files downloaded previously will be used. This is convenient for testing
19+
to avoid downloading new files when you don't really need absolutely current data.
20+
By default, the code will fetch the most recent JSON dumps from the Web.
21+
22+
Some examples write their output to files. These files are put into the subdirectory
23+
"results" under the directory from where the application is run. Files in CSV
24+
format can be loaded in any spreadsheet tool to make diagrams, for example.
25+
26+
Guide to the Available Examples
27+
-------------------------------
28+
29+
Ordered roughly from basic to advanced/specific.
30+
31+
#### EntityStatisticsProcessor.java ####
32+
33+
This program processes entities (items and properties) to collect some basic
34+
statistics. It counts how many items and properties there are, the number of labels,
35+
descriptions, and aliases, and the number of statements. This code might be useful
36+
to get to know the basic data structures where these things are stored. The example
37+
also counts the usage of each property in more details: its use in the main part
38+
of statements, in qualifiers, and in references is counted separately. The results
39+
for this are written into a CSV file in the end.
40+
41+
#### FetchOnlineDataExample.java ####
42+
43+
This program shows how to fetch live data from wikidata.org via the Web API. This can
44+
be used with any other Wikibase site as well. It is not practical to fetch all data
45+
in this way, but it can be very convenient to get some data directly even when processing
46+
a dump (since the dump can only be read in sequence).
47+
48+
#### EditOnlineDataExample.java ####
49+
50+
This program shows how to create and modify live data on test.wikidata.org via the Web API.
51+
This can be used with any other Wikibase site as well. The example first creates a new item
52+
with some starting data, then adds some additional statements, and finally modifies and
53+
deletes existing statements. All data modifications automatically use revision ids to make
54+
sure that no edit conflicts occur (and we don't modify/delete data that is different from
55+
what we expect).
56+
57+
#### LocalDumpFileExample.java ####
58+
59+
This program shows how to process a data dump that is available in a local file, rather
60+
than being automatically downloaded (and possibly cached) from the Wikimedia site.
61+
62+
#### GreatestNumberProcessor.java ####
63+
64+
This simple program looks at all values of a number property to find the item with the
65+
greatest value. It will print the result to the console. In most cases, the item with
66+
the greatest number is fairly early in the data export, so watching the program work is
67+
not too exciting, but it shows how to read a single property value to do something with
68+
it. The property that is used is defined by a constant in the code and can be changed to
69+
see some other greatest values.
70+
71+
#### LifeExpectancyProcessor.java ####
72+
73+
This program processes items to compute the average life expectancy of people on
74+
Wikidata. It shows how to get details (here: year numbers) of specific statement values
75+
for specific properties (here we use Wikidata's P569 "birth date" and P570 "death date").
76+
The results are stored in a CSV file that shows average life expectancy by year of
77+
birth. The overall average is also printed to the output.
78+
79+
#### WorldMapProcessor.java ####
80+
81+
This program generates images of world maps based on the locations of Wikidata items,
82+
and stores the result in PNG files. The example builds several maps, for Wikidata as
83+
a whole and for several big Wikipedias (counting only items with an article in there).
84+
The code offers easy-to-adjust parameters for the size of the output images, the
85+
Wikimedia projects to consider, and the scale of the color values.
86+
87+
[Wikidata world maps for June 2015](https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en)
88+
89+
#### GenderRatioProcessor.java ####
90+
91+
This program uses Wikidata to analyse the number of articles that exist on certain
92+
topics in different Wikimedia projects (esp. in Wikipedias). In particular, it counts
93+
the number of articles about humans and humans of a specific gender (female, male, etc.).
94+
Can be used to estimate the gender balance of various Wikipedias. The results are stored
95+
in a CSV file (all projects x all genders), but for the largest projects they are also
96+
printed to the output. This example is inspired by Max Klein's work on this topic.
97+
98+
[Related blog post by Max Klein](http://notconfusing.com/sex-ratios-in-wikidata-part-iii/)
99+
100+
#### JsonSerializationProcessor.java ####
101+
102+
This program creates a JSON file that contains English language terms, birthdate, occupation,
103+
and image for all people on Wikidata who were born in Dresden (the code can easily be
104+
modified to make a different selection). The example shows how to serialize Wikidata Toolkit
105+
objects in JSON, how to select item documents by a property, and how to filter documents to
106+
ignore some of the data. The resulting file is small (less than 1M).
107+
108+
#### SitelinksExample.java ####
109+
110+
This program shows how to get information about the site links that are used in Wikidata
111+
dumps. The links to Wikimedia projects use keys like "enwiki" for English Wikipedia or
112+
"hewikivoyage" for Hebrew WikiVoyage. To find out the meaning of these codes, and to
113+
create URLs for the articles on these projects, Wikidata Toolkit includes some simple
114+
functions that download and process the site links information for a given project.
115+
This example shows how to use this functionality.
116+
117+
#### ClassPropertyUsageExample.java ####
118+
119+
This advanced program analyses the use of properties and classes on Wikidata, and creates
120+
output that can be used in the [Miga data browser](http://migadv.com/). You can see the
121+
result online at http://tools.wmflabs.org/wikidata-exports/miga/. The program is slightly
122+
more complex, involving several processing steps and additional code for formatting output
123+
for CSV files.
124+
125+
#### RdfSerializationExample.java ####
126+
127+
This program creates an RDF export. You can also do this directly using the command line
128+
client. The purpose of this program is just to show how this could be done in code, e.g.,
129+
to implement additional pre-processing before the RDF serialisation.
130+
131+
132+
Other Helper Code
133+
-----------------
134+
135+
#### ExampleHelpers.java ####
136+
137+
This class provides static helper methods to iterate through dumps, to configure the
138+
desired logging behaviour, and to write files to the "results" directory. It also allows
139+
you to change some global settings that will affect most examples. The code is of interest
140+
if you want to find out how to build a standalone application that includes all aspects
141+
without relying on the example module.
142+
143+
#### EntityTimerProcessor.java ####
144+
145+
This is a helper class that is used in all examples to print basic timer information and
146+
to provide support for having a timeout (cleanly abort processing after a fixed time, even
147+
if the dump would take much longer to complete; useful for testing). It should not be of
148+
primary interest for learning how to use Wikidata Toolkit, but you can have a look to find
149+
out how to use our Timer class.
150+
151+
Additional Resources
152+
--------------------
153+
154+
* [Wikidata Toolkit homepage](https://www.mediawiki.org/wiki/Wikidata_Toolkit)
155+
* [Wikidata Toolkit Javadocs](http://wikidata.github.io/Wikidata-Toolkit/)

0 commit comments

Comments
 (0)
Please sign in to comment.