How to remove duplicate GVKEY-DATADATE when using Compustat Annually (FUNDA) and Quarterly (FUNDQ)?

The annual data is easy to deal with, you just need to add conditions as follows:

indfmt=="INDL" & datafmt=="STD" & popsrc=="D" & consol=="C"

If you have downloaded FUNDA and converted it into Stata format, the uniqueness of GVKEY-DATADATE can be verified by a Stata command:

duplicates report gvkey datadate if indfmt=="INDL" & datafmt=="STD" & popsrc=="D" & consol=="C"

This command will return “no duplicates”.

The quarterly data is a little complicated. However, my test shows that as of December 5, 2107, after applying the same filter as above, 99.84% of GVKEY-DATADATE pairs on FUNDQ are unique. This means no matter how you deal with duplicates, even simply and brutally delete all of them, your results probably will not be impacted in a noticeable way.

Have that said, if you want to do something, WRDS gives this clue:

In the definition of datafqtr, WRDS notes that,

Note: Companies that undergo a fiscal-year change may have multiple records with the same datadate. Compustat delivers those multiple records with the same datadate but each record relates to a different fiscal year-end period.

Rule: Select records from the co_idesind data group where datafqtr is not null, to view as fiscal data.

Unfortunately, I find this tip does not work in my test. Thus, I come up with another way. The root cause of duplicate GVKEY-DATADATEs is firms changing their fiscal year end. For example, if a firm changes its fiscal year end from September 30 to December 31 when releasing its September 30 financial statements, there will be two records for September 30 on FUNDQ—one for fiscal quarter 4 based on the original year end and the other for fiscal quarter 3 based on the new year end). We can elect to keep the GVKEY-DATADATE that represents the fiscal quarter based on the new fiscal year end. This can be done by imposing a new condition fyr=fyrc for duplicate GVEKY-DATADATEs. fyr represents then-current fiscal year end and is in FUNDQ, and fryc represent the most recent fiscal year end and can be extracted from the following dataset:


Please note there may be a minor problem—if a firm changes it fiscal year end more than once, we may lose some GVKEY-DATADATEs completely. But again, this probably does not matter at all.

This entry was posted in Learning Resources. Bookmark the permalink.

One Response to How to remove duplicate GVKEY-DATADATE when using Compustat Annually (FUNDA) and Quarterly (FUNDQ)?

  1. YANLEI ZHANG says:

    Hi, Kai, I think it depends on what you need, it could be either calendar quarter or fiscal quarter. I usually try to collect the information for each calendar quarter, in that case, I only keep the recent convention for the duplicated observations.

Leave a Reply

Your email address will not be published. Required fields are marked *