-
Notifications
You must be signed in to change notification settings - Fork 0
Home
On this wiki we collect tips and tricks to some widely used datasets that go beyond the official documentation. Most datasets have their quirks and require a substantial time investment to learn about them. The scope of this wiki is to reduce that startup time.
Contents:
- DADS panel tous salaries with EDP (
paneledp2016
) - French Customs Data
- French Business Tax Data
The official docs are somewhat loose on which version of the PCS4
classification is used. Given that the panel runs from 1978 up until today, there are several potential candidates. The variable search on paneledp2016
for PCS4
links to this INSEE page, which is for the 2003 version of the PCS4. However, the variable itself contains values like 3300
or 5800
, which are not part of that classification. Given that those entries are more prevalent in earlier years, what do those codes mean? Here is an online search for the 2003 classification.
A nice document by Bergounhon, Lenoir and Mejean on how to use French Customs Data, with a special focus on longitudinal analyses: http://www.isabellemejean.com/BergounhonLenoirMejean_2018.pdf. Link to the companion website: http://isabellemejean.com/FrenchCustomsData.html
When working with firm's tax return data (FARE, FICUS, BRN, etc.), you might have had troubles with understanding what a firm is, especially if you are working on large firms. In these data sets, firms' unique identifier is called "SIREN". "SIRET" numbers correspond to establishments, SIRET codes are concatenation of the SIREN number (9 digits) + a NIC number (5 digits). There can be several SIRET per SIREN. There is only one SIREN per SIRET.
Over the past decades, the INSEE has decided to push for a new definition of the firm. The "SIREN" definition of a firm corresponds to a legal definition (a siren corresponds to an "unité légale"). Yet, the INSEE, willing to understand how firms work, and especially large firms, has started to group SIREN to have "economically meaningful" entities. See this link https://www.insee.fr/fr/metadonnees/source/fichier/imet130.pdf (page 248) for details on the rationale of "profilage" (in French). The five first groups that have been profiled by the INSEE are: PSA, Renault, Accor, St Gobain and Seb. They are often referred to as "profilés historiques". Their SIREN number starts with "P". As a result, firms' tax return data usually includes these two dummy variables: "diff_ul" (ul=unité légale, legal unit) and diff_ep (ep=entreprise profilée, profiled firm). diff_ep = 1 & diff_ul == 1: profilées historiques: PSA, Renault, ACCOR, SEB and St Gobain. diff_ep = 0 & diff_ul == 1: legal units that are part of a "profiled firm" diff_ep = 1 & diff_ul == 0: profiled firm diff_ep = 0 & diff_ul == 0: other firms (too small to have a "profiled" version) A given firm can appear twice in a given dataset: under its "legal unit" form and under its "profiled" form.