SAM.gov
Introduction
What is SAM.gov?
The System for Award Management (SAM.gov) is an official website of the U.S. Government. There is no cost to use SAM.gov.
1. What data does SAM.gov provide?
- Access publicly available award data via data extracts and system accounts
- Search for entity registration and exclusion records
- Search for assistance listings (formerly CFDA.gov), wage determinations (formerly WDOL.gov), contract opportunities (formerly FBO.gov), and contract data reports (formerly part of FPDS.gov)
2 What data do we use?
Page
- Contract Opportunities
- Assisted Listings (Planning to use – Have used earlier)
Data Flow
Scheduled Daily
from
Monday to Friday
-
SAM.gov serves its data via an API.
-
The latest and historical data can also be accessed from the Data services tab API •Their data is stored on an AWS S3 Bucket
-
We first generate the metadata for the raw data scraper to run.
-
The metadata consists of URLs to the respective files offered by the source Scraping •We then send HTTP GET requests to URL and fetch the data
-
Raw data extracted is cleaned in this step with special attention to specific fields. •Country names are converted to 3 ISO country codes
Cleaning
•Unique Id is generated for every tender
•We figure out the location infomation from the cleaned data such location label, map
coordinates using forward and reverse geocoding
•If no location information other than the country is available then the central Geocoding coodinates of that country are mapped
•The column names are converted to lower snake case
•Statuses are mapped, sector and subsector are standerdized Standardization •Metadata is generated for entire DAG run
Ingestion
Page •The final processed data is then batch/bulk ingested into Elasticsearch
•An index is created for that data product
Elastic Search •The index gets updated and refreshed with newly added data
Ingestion Schema
Sr. No | Column Name | Description |
---|---|---|
1 | aac_code | Activity Address Code (AAC) - six-digit code for office |
2 | active | Whether the tender or contract opportunity is Active or Inactive |
3 | additional_info_link | Additional Info Link |
4 | address | Office Address |
5 | archive_date | Archival Date |
6 | archive_type | Archive Type |
7 | aug_id | Unique ID generated for identification of contract opportunity |
8 | award_date | Award Data |
9 | award_number | Award ID |
10 | award_usd | Award Amount in USD |
11 | awardee | Awardee |
12 | basetype | Opportunity original type |
13 | budget | Budget in USD |
14 | cgac | CGAC: Common Government-wide Accounting Classification Code - for department/agency |
15 | city | Office City |
16 | classification_code | Classification Code |
17 | country_code | ISO 3166 Alpha 3 country code |
18 | country_name | Short Country Name |
19 | countrycode | ISO 3166 Alpha 2 country code |
20 | department_or_industry_agency | Department Or Industry Agency |
21 | description | Tender Description |
22 | fpds_code | Federal Procurement Data System (FPDS) Code - for Sub-Tier |
23 | geometry | GeoJSON having map coordinates |
24 | identified_sector | Identified Sector |
25 | identified_sector_subsector_tuple | Identified Subsector |
26 | identified_status | Identified Status |
27 | identified_subsector | Identified Subsector |
28 | industry | Industry name or NAICS code description |
29 | latitude | Latitude |
30 | url | Link to the Contract Opportunity Page |
31 | longitude | Longitude |
32 | map_coordinates | Map Coordinates in decimal degrees |
33 | naics_code | North American Industry Classification System Code (NAICS) |
34 | name | Tender’s name |
35 | office | Office |
36 | organization_type | Type of an organization – department/sub-tier/office |
37 | original_id | ID given by SAM.gov |
38 | pop_city | Place of Performance City |
39 | pop_country | Place of Performance Country |
40 | pop_state | Place of Performance State |
41 | pop_street_address | Place of Performance Street Address |
42 | pop_zip | Place of Performance Zip |
43 | posted_date | Posted Date |
44 | primary_contact_email | Primary Contact Email |
45 | primary_contact_fax | Primary Contact Fax |
46 | primary_contact_fullname | Primary Contact Full Name |
47 | primary_contact_phone | Primary Contact Phone Number |
48 | primary_contact_title | Primary Contact Title |
49 | project_or_tender | Label for Project or Tender |
50 | region_code | Region Code - 3 letter |
51 | region_name | Region Name |
52 | response_deadline | Response Deadline |
53 | secondary_contact_email | Secondary Contact Email |
54 | secondary_contact_fax | Secondary Contact Fax |
55 | secondary_contact_fullname | Secondary Contact Full Name |
56 | secondary_contact_phone | Secondary Contact Phone Number |
57 | secondary_contact_title | Secondary Contact Title |
58 | sector | Sector - same as industry |
59 | set_a_side | Set Aside |
60 | set_a_side_code | Set Aside Code |
61 | solicitation_id | Solicitation Id |
62 | source | Source Abbreviation |
63 | stage | Stage |
64 | state | Office State |
65 | status | Status of Tender |
66 | sub_tier | Agency Name (L2) |
67 | subsector Subsector | |
68 | timestamps | Has of all the important dates |
69 | timestamp_range | MIN and MAX timestamps |
70 | zipcode | Place of Performance (Zip code) |
Sample Data
[
{
"original_id": "f078d8644400c3223f7ecd1f8cf068b6",
"name": "D -- Western Range Operations Communication And Information
(Wroci)",
"solicitation_id": "F04684-01-R-0008",
"department_or_industry_agency": "Dept Of Defense",
"cgac": 57.0,
"sub_tier": "Dept Of The Air Force",
"fpds_code": "5700",
"office": "FA4610 30 CONS PK",
"aac_code": "FA4610",
"posted_date": "2001-09-28",
"stage": "Presolicitation",
"basetype": "Presolicitation",
"archive_type": "manual",
"archive_date": "2003-07-26",
"set_a_side_code": "SBA",
"set_a_side": "Total Small Business Set-Aside (FAR 19.5)",
"response_deadline": "2002-08-19",
"naics_code": null,
"classification_code": "D",
"pop_street_address": "Vandenberg AFB\n93437",
"pop_city": null, 7 "pop_state": null, Page "pop_zip": null,
"pop_country": null,
"active": "No",
"award_number": null,
"award_date": "NaT",
"award_usd": null,
"awardee": null,
"primary_contact_title": "Contracting Officer",
"primary_contact_fullname": "Evelyn L Swain",
"primary_contact_email": "evelyn.swain@vandenberg.af.mil",
"primary_contact_phone": "(805)606-3981",
"primary_contact_fax": "(805)606-5193",
"secondary_contact_title": null,
"secondary_contact_fullname": null,
"secondary_contact_email": null,
"secondary_contact_phone": null,
"secondary_contact_fax": null,
"organization_type": "OFFICE",
"state": "CA",
"city": "Vandenberg Sfb",
"zipcode": "93437-5212",
"countrycode": "USA",
"additional_info_link": null,
"url":
"https:\/\/beta.sam.gov\/opp\/f078d8644400c3223f7ecd1f8cf068b6\/view",
"description": null,
"budget": null,
"country_code": "USA",
"country_name": "United States of America",
"region_name": "North America",
"region_code": "NAC",
"status": "Archived",
"industry": "Not Identified",
"project_or_tender": "T",
"source": "SAMGOVCO", 8 "aug_id": "SAMGOVCO_f078d8644400c3223f7ecd1f8cf068b6", Page "latitude": null,
"longitude": null,
"geometry": null,
"address": null, "map_coordinates": {
"lat": null,
"lon": null
},
"sector": "Not Identified",
"subsector": "Not Available",
"identified_sector": "industry, trade and services",
"identified_subsector": "infrastructure",
"identified_sector_subsector_tuple": [
[
"industry, trade and services",
"infrastructure"
],
[
"ict",
"telecommunication"
]
],
"identified_status": "Closed",
"timestamps": {
"archive_date": "2003-07-26 00: 00: 00",
"award_date": null,
"posted_date": "2001-09-28 00: 00: 00",
"response_deadline": "2002-08-19 00: 00: 00"
},
"timestamp_range": {
"min": "2001-09-28 00: 00: 00",
"max": "2003-07-26 00: 00: 00" }
}
Status Mappings Original Field Name Mapped to Yes Active
status
No Closed
NAICS Code & Description
- Refer to the website https://www.naics.com/search/ to know more about interpretation of NAICS Code and its detailed description.
Metadata
Attribute | Description |
---|---|
Source Name | The System of Award Management – Contract Opportunities |
Source Abbreviation | SAMGOVCO |
Website URL | https://sam.gov/content/home |
Temporal Coverage | 1970 - Present |
Geographical Coverage | Mainly USA and countries from all over the world |
Update Frequency | Daily |
Source Type | Tenders & Procurements |
Scope of Improvement
- Better storage management and memory optimization can be done by processing data in batches.
- We can dynamically generate paths where the data has to stored.
- Historical data has to be scraped just once.
- Avoid repetitive delete and recreate of Elasticsearch indices and just update them after every iteration.