Wikidata

   Wikidata is a large collaborative project (a sister project of
   [1]Wikipedia, hosted by Wikimedia Foundation) for creating a huge
   noncommercial [2]public domain [3]database containing information
   basically about everything. Well, not literally everything -- there are
   some rules about what can be included that are similar to those on
   [4]Wikipedia, e.g. notability (you can't add yourself unless you're
   notable enough, of course you can't add illegal data etc.). Wikidata
   records data in a form of so called [5]knowledge graph, i.e. it connects
   items and their properties with statements such as "Earth:location:inner
   Solar System", creating a mathematical structure called a [6]graph. The
   whole database is available to anyone for any purpose without any
   conditions, under [7]CC0!

   Wikidata is incredibly useful and a bit unfairly overlooked in the shadow
   of its giant sibling Wikipedia, even though it offers a way to easily
   obtain large, absolutely [8]free and public domain data sets about
   anything. The database can be queried with specialized languages so one
   can obtain let's say coordinates of all terrorist attacks that happened in
   certain time period, a list of famous male cats, visualize the tree of
   biological species, list Jews who run restaurants in Asia or any other
   crazy thing. Wikidata oftentimes contains extra information that's not
   present in the Wikipedia article about the item and that's not even
   quickly found by [9]googling, and the information is at times also backed
   by sources just like on Wikipedia, so it's nice to always check Wikidata
   when researching anything.

   Wikidata was opened on 30 October 2012. The first data that were stored
   were links between different language versions of Wikipedia articles,
   later Wikipedia started to use Wikidata to store information to display in
   infoboxes in articles and so Wikidata grew and eventually became a
   database of its own. As of 2022 there is a little over 100 million items,
   over 1 billion statements and over 20000 active users.

Database Structure

   The database is a [10]knowledge graph. It stores the following kinds of
   records:

     * entities: Specific "things", concrete or abstract, that exist and are
       stored in the database. Each one has a unique [11]ID, name (not
       necessarily unique), description and optional aliases (alternative
       names).
          * items: Objects of the real world, their ID is a number prepended
            with the letter Q, e.g. [12]dog (Q144), [13]Earth (Q2), idea
            (QQ131841) or [14]Holocaust (Q2763).
          * properties: Attributes that items may possess, their ID is a
            number prepended with the letter P, e.g. instance of (P31), mass
            (P2067) or image (P18). Properties may have constraints (created
            via statements), for example on values they may take.
     * statements: Information about items and properties which may possibly
       link items/properties (entities) with other items/properties. One
       statement is so called triplet, it contains a subject (item/property),
       verb (property) and object (value, e.g. item/property, number, string,
       ...). I.e. a statement is a record of form entity:property:value, for
       example dog(Q144):subclass of(P279):domestic mammal(Q57814795).
       Statements may link one property with multiple values (by having
       multiple statements about an entity with the same property), for
       example a man may have multiple nationalities etc. Statements may also
       optionally include qualifiers that further specify details about the
       statement, for example specifying the source of the data.

   The most important properties are probably instance of (P31) and subclass
   of (P279) which put items into [15]sets/classes and establish
   subsets/subclasses. The instance of attribute says that the item is an
   individual manifestation of a certain class (just like in [16]OOP), we can
   usually substitute is with the word "is", for example Blondi (Q155695,
   [17]Hitler's dog) is an instance of dog (Q144); note that an item can be
   an instance of multiple classes at the same time. The subclass of
   attribute says that a certain class is a subclass of another, e.g. dog
   (Q144) is a subclass of pet (Q39201) which is further a subclass of
   domestic animal (Q622852) etc. Also note that an item can be both an
   instance and a class.

How To

   There are many [18]libraries/[19]APIs for wikidata you can use, unlike
   shitty corporations that guard their data by force wikidata provides data
   in friendly ways -- you can even download the whole wikidata database in
   [20]JSON format (about 100 GB).

   The easiest way to retrieve just the data you are interested in is
   probably going to the online query interface
   (https://query.wikidata.org/), entering a query (in [21]SPARQL language,
   similar to [22]SQL) and then clicking download data -- you can choose
   several formats, e.g. [23]JSON, [24]CSV etc. That can then be processed
   further with whatever language or tool, be it [25]Python, [26]LibreOffice
   Calc etc.

   BEWARE: the query you enter may easily take a long time to execute and
   time out, you need to write it nicely which for more complex queries may
   be difficult if you're not familiar with SPARQL. However wikidata offers
   online tips on [27]optimization of queries and there are many examples
   right in the online interface which you can just modify to suit you.

   Here are some example of possible queries. The following one selects video
   [28]games of the [29]FPS genre:

 SELECT ?item ?itemLabel WHERE
 {
   ?item wdt:P31 wd:Q7889.    # item is video game and
   ?item wdt:P136 wd:Q185029. # item is FPS
  
   # this gets the item label:
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
 }
 LIMIT 100 # limit to 100 results, make the query faster

   Another query may be this one: select [30]black holes along with their
   mass (where known):

 SELECT ?item ?itemLabel ?mass WHERE
 {
   { ?item wdt:P31 wd:Q589. } # instances of black hole
   UNION
   { ?item wdt:P31 ?class. # instance of black hole subclass (e.g. supermassive blackhole, ...)
     ?class wdt:P279 wd:Q589. }

   OPTIONAL { ?item wdt:P2067 ?mass }
  
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
 }

Links:
1. wikipedia.md
2. public_domain.md
3. database.md
4. wikipedia.md
5. knowledge_graph.md
6. graph.md
7. cc0.md
8. free_culture.md
9. google.md
10. knowledge_graph.md
11. id.md
12. dog.md
13. earth.md
14. holocaust.md
15. set.md
16. oop.md
17. hitler.md
18. library.md
19. api.md
20. json.md
21. sparql.md
22. sql.md
23. json.md
24. csv.md
25. python.md
26. libreoffice.md
27. optimization.md
28. game.md
29. fps.md
30. black_hole.md