Look into CRSP/COMPUSTAT link table

The link history table (CCMXPF_LNKHIST) is the main table is used for WRDS CCM web queries. So I focus on this link table.

BACKGROUND

We know that a company can issue multiple securities, of which one is considered primary for a company. On CRSP, this suggests that one PERMCO (company-lever identifier) can have multiple PERMNO (security-level identifier). Because CRSP collects data on security prices, we should consider CRSP as breaking down to the security level.

COMPUSTAT, in essence, provides financial data of every company. So, the micro unit on COMPUSTAT is each and every company. Previously, COMPUSTAT included one security per record (i.e., per company at a given time). Since mid-April in 2007, all securities are available on COMPUSTAT with a new identifier, IID, which can be used along with GVKEY to permanently identify all securities tracked by COMPUSTAT, and a marker item, PRIMISS, that identify the security that COMPUSTAT considers primary. Like PERMCO on CRSP, one GVKEY can have multiple IIDs.

Let me summarize the identifiers used in the linking process:

IdentifierDescription
GVKEYCompustat’s permanent company identifier.
IIDCompustat’s permanent issue identifier. An identifying relationship exists between IID and GVKEY. Both must be accessed as a pair to properly identify a Compustat security. One GVKEY can have multiple IIDs.
PRIMISSBecause Compustat company data ranges can extend earlier than security ranges, there may be some time periods with no identified IID for a GVKEY. In these cases, CRSP assigns a dummy IID ending in “X” as a placeholder in the link. This range may or may not be associated with a CRSP PERMNO, but there is no Compustat security data found during the range when no IID is assigned.

Compustat provides a primary marker indicating which security is considered primary for a company at a given time.
PERMCOCRSP’s permanent company identifier.
PERMNOCRSP’s permanent issue identifier. One PERMNO belongs to only one PERMCO. One PERMCO can have one or more PERMNOs.

The last piece of background information is that the link between CRSP and COMPUSTAT (at both company level and security level) can change over time.

THE LINKING PROCESS

Prior to the introduction of IID, Compustat included only one security per record. The links between CRSP and Compustat were between CRSP PERMNO and Compustat GVKEY. Because PERMNO is a security identifier and GVKEY is a company identifier, the linking could be a many to one relationship. More than one PERMNO may be linked to a single GVKEY.

Because Compustat security-level information is now available, CRSP started to build the security-level links in April 2007.

THE LINK HISTORY TABLE

The main product of CRSP’s laborious linking efforts is the link history table. This table is COMPUSTAT-centric, that is, this table is organized and identified by Compustat identifiers which are then linked to CRSP identifiers. All Compustat records are retained, regardless of whether or not securities (defined by GVKEY-IID) are in the CRSP universe.The following is a slice of the table (Please note IID, PERMCO, and PERMNO have the prefix “L” in the link history table.):

GVKEYLINKPRIMLIIDLINKTYPELPERMNOLPERMCOLINKDTLINKENDDT
COMPUSTAT global company keyPrimary link markerSecurity-level identifierLink type codeHistorical CRSP PERMNO link to COMPUSTAT recordHistorical CRSP PERMCO link to COMPUSTAT recordFirst effective date of linkLast effective date of link
10411P1LC63773523019811215.E
10411P1NR1974112919811214
10411J7LC9065552302005051620120131
10411J6NR2005042920060131
10411J2NR19940331.E
10411J4NR2002013120060131

LINKPRIM is a marker item that indicates whether a GVKEY-IID is a primary security. This marker is based on Compustat Primary/Joiner flag (PRIMISS). However, due to missing primary issue markers from COMPUSTAT for early history, calendar ranges of overlapping, and different treatment for US and Canadian security issues, CRSP overides COMPUSTAT’s primary issue markers in many cases. The purpose is to produce one primary security throughout the company history. “P” represents the primary security issue identified by COMPUSTAT, while “C” represents the primary security issue identified or overridden by CRSP. In most applications we only need the primary security link.

Another important item is LINKTYPE. In short, LC and LU are considered as accurate links. They are also the default link types used for WRDS CCM web queries. LX and LD are considered as “soft” links of low accuracy. Old merging sample codes also include LS in addition to LC and LU. But by definition below, I do not think LS should be included.

LINKDT and LINKENDDT are straightforward. They mark the period during which the link is valid.

Please see the detailed description of each item:

ITEM NAMETYPEDESCRIPTION
GVKEYinteger, primary key (1)Compustat GVKEY
LIIDchar(3), primary key (2)Compustat IID. Dummy IID assigned with an “X” suffix during a range when company data exists but no Compustat security is identified.
LINKDTinteger (date), primary key (3)First effective calendar date of link record range
LINKENDDTinteger (date)Last effective calendar date of link record range
LPERMNOintegerLinked CRSP PERMNO, 0 if no CRSP security link exists
LPERMCOintegerLinked CRSP PERMCO, 0 if no CRSP company link exists
LINKPRIMchar(3)Primary issue marker for the link. Based on Compustat Primary/Joiner flag (PRIMISS), indicating whether this link is to Compustat’s marked primary security during this range.

P = Primary, identified by Compustat in monthly security data.

J = Joiner secondary issue of a company, identified by Compustat in monthly security data.

C = Primary, assigned by CRSP to resolve ranges of overlapping or missing primary markers from Compustat in order to produce one primary security throughout the company history.

N = Secondary, assigned by CRSP to override Compustat. Compustat allows a US and Canadian security to both be marked as Primary at the same time. For Purposes of the link, CRSP allows only one primary at a time and marks the others as N.
LINKTYPEchar(3)Link type code. Each link is given a code describing the connection between the CRSP and Compustat data. Values are:

LC – Link research complete. Standard connection between databases.

LU – Unresearched link to issue by CUSIP

LX – Link to a security that trades on another exchange system not included in CRSP data.

LD – Duplicate Link to a security. Another GVKEY/IID is a better link to that CRSP record.

LN – Primary link exists but Compustat does not have prices.

LS – Link valid for this security only. Other CRSP PERMNOs with the same PERMCO will link to other GVKEYs.

NR – No link available, confirmed by research

NU – No link available, not yet confirmed

I download the link history table as of January 30, 2015. I delete records without a link found (about 67% of all records; remember the link history table is Compustat-centric and the Compustat universe is bigger than the CRSP universe). For remaining records, each of which provides a link between COMPUSTAT and CRSP, I provide the following statistics to give you a sense of LINKPRIM, LINKTYPE, and LIID. As you can see, the vast majority of primary issue markers are identified by CRSP, and “LC” and “LU” types of links constitute about 90% of all identified links.

MERGING CODE

You may notice the following announcement on CCM product:

As of the February 2014 release, USEDFLAG is no longer used in the WRDS CCM web queries.  Please select LINKTYPES LC, LU, and LS for the same results. These represent the vast majority of the links between CRSP securities and Compustat companies, without introducing duplicate data.

The WRDS-created linking dataset (ccmxpf_linktable) has been deprecated. It will continue to be created for a transition period of 1 year. SAS programmers should use the Link History dataset (ccmxpf_lnkhist) from CRSP.

This suggests that many old merging codes should be updated accordingly. Based on the discussion above, the most important query conditioning variables are LINKPRIM, LINKTYPE, LINKDT and LINKENDDT. LINKPRIM is used to select only primary security and remove duplicates. LINKTYPE is used to ensure accuracy. LINKDT and LINKENDDT are used to ensure validity of a link at a give time. I believe that the following code is better than most of sample codes I have seen:

Finally, I acknowledge that this information mainly comes from the official CRSP/Compustat Merged Database Guide.

This entry was posted in Learning Resources, SAS. Bookmark the permalink.

6 Responses to Look into CRSP/COMPUSTAT link table

  1. Carolina Magda says:

    Very good!!!

  2. Jing Xu says:

    Thanks. This is helpful.

  3. Soheila says:

    Thanks so much. These codes are very helpful.

  4. Soheila says:

    Thanks. These codes are very helpful.

  5. Victor says:

    Good stuff! Very helpful!

    I have a question for you. I saw this on the web:

    usually linkdt and linkenddt is a date, but linkdt can be ‘B’ (beginning) and linkenddt
    can be ‘E’ (end).

    How do you handle this when linkdt and linkenddt have values of ‘B’ or ‘E’?

    Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *