Skip to content

SAM.gov

Introduction

What is SAM.gov?

The System for Award Management (SAM.gov) is an official website of the U.S. Government. There is no cost to use SAM.gov.

1. What data does SAM.gov provide?

  1. Access publicly available award data via data extracts and system accounts
  2. Search for entity registration and exclusion records
  3. Search for assistance listings (formerly CFDA.gov), wage determinations (formerly WDOL.gov), contract opportunities (formerly FBO.gov), and contract data reports (formerly part of FPDS.gov)

2 What data do we use?

Page

  1. Contract Opportunities
  2. Assisted Listings (Planning to use – Have used earlier)

Data Flow

Scheduled Daily

from

Monday to Friday

  1. SAM.gov serves its data via an API.

  2. The latest and historical data can also be accessed from the Data services tab API •Their data is stored on an AWS S3 Bucket

  3. We first generate the metadata for the raw data scraper to run.

  4. The metadata consists of URLs to the respective files offered by the source Scraping •We then send HTTP GET requests to URL and fetch the data

  5. Raw data extracted is cleaned in this step with special attention to specific fields. •Country names are converted to 3 ISO country codes

Cleaning

•Unique Id is generated for every tender

•We figure out the location infomation from the cleaned data such location label, map

coordinates using forward and reverse geocoding

•If no location information other than the country is available then the central Geocoding coodinates of that country are mapped

•The column names are converted to lower snake case

•Statuses are mapped, sector and subsector are standerdized Standardization •Metadata is generated for entire DAG run

Ingestion

Page •The final processed data is then batch/bulk ingested into Elasticsearch

•An index is created for that data product

Elastic Search •The index gets updated and refreshed with newly added data

Ingestion Schema

Sr. No Column Name Description
1 aac_code Activity Address Code (AAC) - six-digit code for office
2 active Whether the tender or contract opportunity is Active or Inactive
3 additional_info_link Additional Info Link
4 address Office Address
5 archive_date Archival Date
6 archive_type Archive Type
7 aug_id Unique ID generated for identification of contract opportunity
8 award_date Award Data
9 award_number Award ID
10 award_usd Award Amount in USD
11 awardee Awardee
12 basetype Opportunity original type
13 budget Budget in USD
14 cgac CGAC: Common Government-wide Accounting Classification Code - for department/agency
15 city Office City
16 classification_code Classification Code
17 country_code ISO 3166 Alpha 3 country code
18 country_name Short Country Name
19 countrycode ISO 3166 Alpha 2 country code
20 department_or_industry_agency Department Or Industry Agency
21 description Tender Description
22 fpds_code Federal Procurement Data System (FPDS) Code - for Sub-Tier
23 geometry GeoJSON having map coordinates
24 identified_sector Identified Sector
25 identified_sector_subsector_tuple Identified Subsector
26 identified_status Identified Status
27 identified_subsector Identified Subsector
28 industry Industry name or NAICS code description
29 latitude Latitude
30 url Link to the Contract Opportunity Page
31 longitude Longitude
32 map_coordinates Map Coordinates in decimal degrees
33 naics_code North American Industry Classification System Code (NAICS)
34 name Tender’s name
35 office Office
36 organization_type Type of an organization – department/sub-tier/office
37 original_id ID given by SAM.gov
38 pop_city Place of Performance City
39 pop_country Place of Performance Country
40 pop_state Place of Performance State
41 pop_street_address Place of Performance Street Address
42 pop_zip Place of Performance Zip
43 posted_date Posted Date
44 primary_contact_email Primary Contact Email
45 primary_contact_fax Primary Contact Fax
46 primary_contact_fullname Primary Contact Full Name
47 primary_contact_phone Primary Contact Phone Number
48 primary_contact_title Primary Contact Title
49 project_or_tender Label for Project or Tender
50 region_code Region Code - 3 letter
51 region_name Region Name
52 response_deadline Response Deadline
53 secondary_contact_email Secondary Contact Email
54 secondary_contact_fax Secondary Contact Fax
55 secondary_contact_fullname Secondary Contact Full Name
56 secondary_contact_phone Secondary Contact Phone Number
57 secondary_contact_title Secondary Contact Title
58 sector Sector - same as industry
59 set_a_side Set Aside
60 set_a_side_code Set Aside Code
61 solicitation_id Solicitation Id
62 source Source Abbreviation
63 stage Stage
64 state Office State
65 status Status of Tender
66 sub_tier Agency Name (L2)
67 subsector Subsector
68 timestamps Has of all the important dates
69 timestamp_range MIN and MAX timestamps
70 zipcode Place of Performance (Zip code)

Sample Data

[

{

"original_id": "f078d8644400c3223f7ecd1f8cf068b6",

"name": "D -- Western Range Operations Communication And Information

(Wroci)",

"solicitation_id": "F04684-01-R-0008",

"department_or_industry_agency": "Dept Of Defense",

"cgac": 57.0,

"sub_tier": "Dept Of The Air Force",

"fpds_code": "5700",

"office": "FA4610 30 CONS PK",

"aac_code": "FA4610",

"posted_date": "2001-09-28",

"stage": "Presolicitation",

"basetype": "Presolicitation",

"archive_type": "manual",

"archive_date": "2003-07-26",

"set_a_side_code": "SBA",

"set_a_side": "Total Small Business Set-Aside (FAR 19.5)",

"response_deadline": "2002-08-19",

"naics_code": null,

"classification_code": "D",

"pop_street_address": "Vandenberg AFB\n93437",

"pop_city": null, 7 "pop_state": null, Page "pop_zip": null,

"pop_country": null,

"active": "No",

"award_number": null,

"award_date": "NaT",

"award_usd": null,

"awardee": null,

"primary_contact_title": "Contracting Officer",

"primary_contact_fullname": "Evelyn L Swain",

"primary_contact_email": "evelyn.swain@vandenberg.af.mil",

"primary_contact_phone": "(805)606-3981",

"primary_contact_fax": "(805)606-5193",

"secondary_contact_title": null,

"secondary_contact_fullname": null,

"secondary_contact_email": null,

"secondary_contact_phone": null,

"secondary_contact_fax": null,

"organization_type": "OFFICE",

"state": "CA",

"city": "Vandenberg Sfb",

"zipcode": "93437-5212",

"countrycode": "USA",

"additional_info_link": null,

"url":

"https:\/\/beta.sam.gov\/opp\/f078d8644400c3223f7ecd1f8cf068b6\/view",

"description": null,

"budget": null,

"country_code": "USA",

"country_name": "United States of America",

"region_name": "North America",

"region_code": "NAC",

"status": "Archived",

"industry": "Not Identified",

"project_or_tender": "T",

"source": "SAMGOVCO", 8 "aug_id": "SAMGOVCO_f078d8644400c3223f7ecd1f8cf068b6", Page "latitude": null,

"longitude": null,

"geometry": null,

"address": null, "map_coordinates": {

"lat": null,

"lon": null

},

"sector": "Not Identified",

"subsector": "Not Available",

"identified_sector": "industry, trade and services",

"identified_subsector": "infrastructure",

"identified_sector_subsector_tuple": [

[

"industry, trade and services",

"infrastructure"

],

[

"ict",

"telecommunication"

]

],

"identified_status": "Closed",

"timestamps": {

"archive_date": "2003-07-26 00: 00: 00",

"award_date": null,

"posted_date": "2001-09-28 00: 00: 00",

"response_deadline": "2002-08-19 00: 00: 00"

},

"timestamp_range": {

"min": "2001-09-28 00: 00: 00",

"max": "2003-07-26 00: 00: 00" }

}

Status Mappings Original Field Name Mapped to Yes Active

status

No Closed

NAICS Code & Description

Metadata

Attribute Description
Source Name The System of Award Management – Contract Opportunities
Source Abbreviation SAMGOVCO
Website URL https://sam.gov/content/home
Temporal Coverage 1970 - Present
Geographical Coverage Mainly USA and countries from all over the world
Update Frequency Daily
Source Type Tenders & Procurements

Scope of Improvement

  • Better storage management and memory optimization can be done by processing data in batches.
  • We can dynamically generate paths where the data has to stored.
  • Historical data has to be scraped just once.
  • Avoid repetitive delete and recreate of Elasticsearch indices and just update them after every iteration.
  1. https://sam.gov/content/home
  2. https://sam.gov/data-services
  3. https://sam.gov/search/
  4. https://open.gsa.gov/api/get-opportunities-public-api/#set-aside-values

  5. https://open.gsa.gov/api/get-opportunities-public-api/#overview