|
1 |
| -# Wikidata-Toolkit-Examples |
| 1 | +# Wikidata Toolkit Examples |
2 | 2 |
|
3 |
| -This repository contains example programs that show some of the features |
4 |
| -of Wikidata Toolkit. |
| 3 | +This is an example project that shows how to set up a Java project that |
| 4 | +uses [Wikidata Toolkit](https://github.com/Wikidata/Wikidata-Toolkit). |
| 5 | +It contains several simple example programs and bots in the source directory. |
5 | 6 |
|
6 |
| -Overview and Settings |
7 |
| ---------------------- |
8 |
| - |
9 |
| -A detailed guide to each of the examples is given below. Many examples process data |
10 |
| -dumps exported by Wikidata. In most cases, the example only contains the actual |
11 |
| -processing code that does something interesting. The code for downloading dumps and |
12 |
| -iterating over them is in the ExampleHelpers.java class, which is used in many examples |
13 |
| -for common tasks. |
14 |
| - |
15 |
| -You can edit the static members in ExampleHelpers to select which dumps should be |
16 |
| -used (the data is available in several formats which may be more or less recent |
17 |
| -and more or less comprehensive). You can also switch to offline mode there: then |
18 |
| -only the files downloaded previously will be used. This is convenient for testing |
19 |
| -to avoid downloading new files when you don't really need absolutely current data. |
20 |
| -By default, the code will fetch the most recent JSON dumps from the Web. |
21 |
| - |
22 |
| -Some examples write their output to files. These files are put into the subdirectory |
23 |
| -"results" under the directory from where the application is run. Files in CSV |
24 |
| -format can be loaded in any spreadsheet tool to make diagrams, for example. |
25 |
| - |
26 |
| -Guide to the Available Examples |
| 7 | +What's found in this repository |
27 | 8 | -------------------------------
|
28 | 9 |
|
29 |
| -Ordered roughly from basic to advanced/specific. |
30 |
| - |
31 |
| -#### EntityStatisticsProcessor.java #### |
32 |
| - |
33 |
| -This program processes entities (items and properties) to collect some basic |
34 |
| -statistics. It counts how many items and properties there are, the number of labels, |
35 |
| -descriptions, and aliases, and the number of statements. This code might be useful |
36 |
| -to get to know the basic data structures where these things are stored. The example |
37 |
| -also counts the usage of each property in more details: its use in the main part |
38 |
| -of statements, in qualifiers, and in references is counted separately. The results |
39 |
| -for this are written into a CSV file in the end. |
40 |
| - |
41 |
| -#### FetchOnlineDataExample.java #### |
42 |
| - |
43 |
| -This program shows how to fetch live data from wikidata.org via the Web API. This can |
44 |
| -be used with any other Wikibase site as well. It is not practical to fetch all data |
45 |
| -in this way, but it can be very convenient to get some data directly even when processing |
46 |
| -a dump (since the dump can only be read in sequence). |
47 |
| - |
48 |
| -#### EditOnlineDataExample.java #### |
49 |
| - |
50 |
| -This program shows how to create and modify live data on test.wikidata.org via the Web API. |
51 |
| -This can be used with any other Wikibase site as well. The example first creates a new item |
52 |
| -with some starting data, then adds some additional statements, and finally modifies and |
53 |
| -deletes existing statements. All data modifications automatically use revision ids to make |
54 |
| -sure that no edit conflicts occur (and we don't modify/delete data that is different from |
55 |
| -what we expect). |
56 |
| - |
57 |
| -#### LocalDumpFileExample.java #### |
58 |
| - |
59 |
| -This program shows how to process a data dump that is available in a local file, rather |
60 |
| -than being automatically downloaded (and possibly cached) from the Wikimedia site. |
61 |
| - |
62 |
| -#### GreatestNumberProcessor.java #### |
63 |
| - |
64 |
| -This simple program looks at all values of a number property to find the item with the |
65 |
| -greatest value. It will print the result to the console. In most cases, the item with |
66 |
| -the greatest number is fairly early in the data export, so watching the program work is |
67 |
| -not too exciting, but it shows how to read a single property value to do something with |
68 |
| -it. The property that is used is defined by a constant in the code and can be changed to |
69 |
| -see some other greatest values. |
70 |
| - |
71 |
| -#### LifeExpectancyProcessor.java #### |
72 |
| - |
73 |
| -This program processes items to compute the average life expectancy of people on |
74 |
| -Wikidata. It shows how to get details (here: year numbers) of specific statement values |
75 |
| -for specific properties (here we use Wikidata's P569 "birth date" and P570 "death date"). |
76 |
| -The results are stored in a CSV file that shows average life expectancy by year of |
77 |
| -birth. The overall average is also printed to the output. |
78 |
| - |
79 |
| -#### WorldMapProcessor.java #### |
80 |
| - |
81 |
| -This program generates images of world maps based on the locations of Wikidata items, |
82 |
| -and stores the result in PNG files. The example builds several maps, for Wikidata as |
83 |
| -a whole and for several big Wikipedias (counting only items with an article in there). |
84 |
| -The code offers easy-to-adjust parameters for the size of the output images, the |
85 |
| -Wikimedia projects to consider, and the scale of the color values. |
86 |
| - |
87 |
| -[Wikidata world maps for June 2015](https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en) |
88 |
| - |
89 |
| -#### GenderRatioProcessor.java #### |
90 |
| - |
91 |
| -This program uses Wikidata to analyse the number of articles that exist on certain |
92 |
| -topics in different Wikimedia projects (esp. in Wikipedias). In particular, it counts |
93 |
| -the number of articles about humans and humans of a specific gender (female, male, etc.). |
94 |
| -Can be used to estimate the gender balance of various Wikipedias. The results are stored |
95 |
| -in a CSV file (all projects x all genders), but for the largest projects they are also |
96 |
| -printed to the output. This example is inspired by Max Klein's work on this topic. |
97 |
| - |
98 |
| -[Related blog post by Max Klein](http://notconfusing.com/sex-ratios-in-wikidata-part-iii/) |
99 |
| - |
100 |
| -#### JsonSerializationProcessor.java #### |
101 |
| - |
102 |
| -This program creates a JSON file that contains English language terms, birthdate, occupation, |
103 |
| -and image for all people on Wikidata who were born in Dresden (the code can easily be |
104 |
| -modified to make a different selection). The example shows how to serialize Wikidata Toolkit |
105 |
| -objects in JSON, how to select item documents by a property, and how to filter documents to |
106 |
| -ignore some of the data. The resulting file is small (less than 1M). |
107 |
| - |
108 |
| -#### SitelinksExample.java #### |
| 10 | +The individual examples are documented in the README file of each package. |
109 | 11 |
|
110 |
| -This program shows how to get information about the site links that are used in Wikidata |
111 |
| -dumps. The links to Wikimedia projects use keys like "enwiki" for English Wikipedia or |
112 |
| -"hewikivoyage" for Hebrew WikiVoyage. To find out the meaning of these codes, and to |
113 |
| -create URLs for the articles on these projects, Wikidata Toolkit includes some simple |
114 |
| -functions that download and process the site links information for a given project. |
115 |
| -This example shows how to use this functionality. |
116 | 12 |
|
117 |
| -#### ClassPropertyUsageExample.java #### |
| 13 | +Running examples using an IDE |
| 14 | +----------------------------- |
118 | 15 |
|
119 |
| -This advanced program analyses the use of properties and classes on Wikidata, and creates |
120 |
| -output that can be used in the [Miga data browser](http://migadv.com/). You can see the |
121 |
| -result online at http://tools.wmflabs.org/wikidata-exports/miga/. The program is slightly |
122 |
| -more complex, involving several processing steps and additional code for formatting output |
123 |
| -for CSV files. |
| 16 | +You can import the project into any Java IDE that supports Maven (and maybe git) |
| 17 | +and run the example programs from there. Wikidata Toolkit provides detailed |
| 18 | +[instructions on how to set up Eclipse for using Maven and git](https://www.mediawiki.org/wiki/Wikidata_Toolkit/Eclipse_setup). |
124 | 19 |
|
125 |
| -#### RdfSerializationExample.java #### |
126 | 20 |
|
127 |
| -This program creates an RDF export. You can also do this directly using the command line |
128 |
| -client. The purpose of this program is just to show how this could be done in code, e.g., |
129 |
| -to implement additional pre-processing before the RDF serialisation. |
| 21 | +Running examples directly using Maven |
| 22 | +------------------------------------- |
130 | 23 |
|
| 24 | +You can also run the code directly using Maven from the command line. For this, |
| 25 | +you need to have Maven and (obviously) Java installed. To compile the project |
| 26 | +and obtain necessary dependencies, run |
131 | 27 |
|
132 |
| -Other Helper Code |
133 |
| ------------------ |
| 28 | +```mvn compile``` |
134 | 29 |
|
135 |
| -#### ExampleHelpers.java #### |
| 30 | +Thereafter, you can run any individual example using its Java class name, for |
| 31 | +example: |
136 | 32 |
|
137 |
| -This class provides static helper methods to iterate through dumps, to configure the |
138 |
| -desired logging behaviour, and to write files to the "results" directory. It also allows |
139 |
| -you to change some global settings that will affect most examples. The code is of interest |
140 |
| -if you want to find out how to build a standalone application that includes all aspects |
141 |
| -without relying on the example module. |
| 33 | +```mvn exec:java -Dexec.mainClass="examples.FetchOnlineDataExample"``` |
142 | 34 |
|
143 |
| -#### EntityTimerProcessor.java #### |
| 35 | +Credits and License |
| 36 | +------------------- |
144 | 37 |
|
145 |
| -This is a helper class that is used in all examples to print basic timer information and |
146 |
| -to provide support for having a timeout (cleanly abort processing after a fixed time, even |
147 |
| -if the dump would take much longer to complete; useful for testing). It should not be of |
148 |
| -primary interest for learning how to use Wikidata Toolkit, but you can have a look to find |
149 |
| -out how to use our Timer class. |
| 38 | +This project is copied from the [Wikidata Toolkit](https://github.com/Wikidata/Wikidata-Toolkit) examples module. |
| 39 | +Authors can be found there. |
150 | 40 |
|
151 |
| -Additional Resources |
152 |
| --------------------- |
| 41 | +License: [Apache 2.0](LICENSE) |
153 | 42 |
|
154 |
| -* [Wikidata Toolkit homepage](https://www.mediawiki.org/wiki/Wikidata_Toolkit) |
155 |
| -* [Wikidata Toolkit Javadocs](http://wikidata.github.io/Wikidata-Toolkit/) |
0 commit comments