Skip to content
curvoy edited this page Jul 7, 2020 · 3 revisions

ScPoEcon Datasets Registry

On this wiki we collect tips and tricks to some widely used datasets that go beyond the official documentation. Most datasets have their quirks and require a substantial time investment to learn about them. The scope of this wiki is to reduce that startup time.

Contents:

  1. DADS panel tous salaries with EDP (paneledp2016)
  2. French Customs Data
  3. French Business Tax Data

Contents of variable PCS4

The official docs are somewhat loose on which version of the PCS4 classification is used. Given that the panel runs from 1978 up until today, there are several potential candidates. The variable search on paneledp2016 for PCS4 links to this INSEE page, which is for the 2003 version of the PCS4. However, the variable itself contains values like 3300 or 5800, which are not part of that classification. Given that those entries are more prevalent in earlier years, what do those codes mean? Here is an online search for the 2003 classification.

French Customs Data

A nice document by Bergounhon, Lenoir and Mejean on how to use French Customs Data, with a special focus on longitudinal analyses: http://www.isabellemejean.com/BergounhonLenoirMejean_2018.pdf. Link to the companion website: http://isabellemejean.com/FrenchCustomsData.html

French Business Tax Data

SIREN and SIRET

When working with firm's tax return data (FARE, FICUS, BRN, etc.), you might have had troubles with understanding what a firm is, especially if you are working on large firms. In these data sets, firms' unique identifier is called "SIREN". "SIRET" numbers correspond to establishments, SIRET codes are concatenation of the SIREN number (9 digits) + a NIC number (5 digits). There can be several SIRET per SIREN. There is only one SIREN per SIRET.

Over the past decades, the INSEE has decided to push for a new definition of the firm. The "SIREN" definition of a firm corresponds to a legal definition (a siren corresponds to an "unité légale"). Yet, the INSEE, willing to understand how firms work, and especially large firms, has started to group SIREN to have "economically meaningful" entities. See this link https://www.insee.fr/fr/metadonnees/source/fichier/imet130.pdf (page 248) for details on the rationale of "profilage" (in French). The five first groups that have been profiled by the INSEE are: PSA, Renault, Accor, St Gobain and Seb. They are often referred to as "profilés historiques". Their SIREN number starts with "P". As a result, firms' tax return data usually includes these two dummy variables: "diff_ul" (ul=unité légale, legal unit) and diff_ep (ep=entreprise profilée, profiled firm). diff_ep = 1 & diff_ul == 1: profilées historiques: PSA, Renault, ACCOR, SEB and St Gobain. diff_ep = 0 & diff_ul == 1: legal units that are part of a "profiled firm" diff_ep = 1 & diff_ul == 0: profiled firm diff_ep = 0 & diff_ul == 0: other firms (too small to have a "profiled" version) A given firm can appear twice in a given dataset: under its "legal unit" form and under its "profiled" form.