Legal and Regulatory Analysis

In 2024, Senator Roger Wicker published a report on strengthening the defense industrial base and reforming defense acquisition. The latter involves cutting red tape by streamlining regulations.

In this tutorial, we will analyze parts of the Federal Acquisition Regulations (FAR) to identify which parts of it are driven by statuatory requirement.

For illustration purposes, we will focus our analysis on Part 9 of the FAR: contractor qualifications.

part_prefixes = ['9.']
from onprem import LLM
from onprem.ingest import load_single_document, extract_files
from onprem import utils as U
from tqdm import tqdm

import pandas as pd


pd.set_option('display.max_colwidth', None)

STEP 1: Download the Data

We will first download the HTML version of the FAR.

import zipfile
import tempfile
import os

# URL of the ZIP file
url = "https://www.acquisition.gov/sites/default/files/current/far/zip/html/FARHTML.zip"

# Create a temporary directory
temp_dir = tempfile.mkdtemp()
zip_path = os.path.join(temp_dir, "FARHTML.zip")

# Download the ZIP file
U.download(url, zip_path, verify=True)

# Extract the ZIP file
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(temp_dir)

print(f"\nFiles extracted to: {temp_dir}")

filenames = [fname for fname in extract_files(temp_dir) if any(fname.lower().endswith('.html') and os.path.basename(fname).startswith(prefix) for prefix in part_prefixes)]
print(f'Total files: {len(list(extract_files(temp_dir)))}')
print(f'Number of files of interest: {len(filenames)}')
print('Sample:')
for fname in filenames[:5]:
    print(f'\t{fname}')
[██████████████████████████████████████████████████]
Files extracted to: /tmp/tmp3xa2rj1w
Total files: 3900
Number of files of interest: 106
Sample:
    /tmp/tmp3xa2rj1w/dita_html/9.406-3.html
    /tmp/tmp3xa2rj1w/dita_html/9.505-4.html
    /tmp/tmp3xa2rj1w/dita_html/9.201.html
    /tmp/tmp3xa2rj1w/dita_html/9.406-5.html
    /tmp/tmp3xa2rj1w/dita_html/9.104-1.html

STEP 2: Text Extraction

We’ll extract text from each of the HTML files.

content = {}
for filename in tqdm(filenames, total=len(filenames)):
    text = load_single_document(filename)[0].page_content
    content[os.path.basename(filename)] = text
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 106/106 [00:06<00:00, 16.76it/s]
print(content['9.505-4.html'])
9.505-4 Obtaining access to proprietary information.

(a) When a contractor requires proprietary information from others to perform a Government contract and can use the leverage of the contract to obtain it, the contractor may gain an unfair competitive advantage unless restrictions are imposed. These restrictions protect the information and encourage companies to provide it when necessary for contract performance. They are not intended to protect information-

(1) Furnished voluntarily without limitations on its use; or

(2) Available to the Government or contractor from other sources without restriction.

(b) A contractor that gains access to proprietary information of other companies in performing advisory and assistance services for the Government must agree with the other companies to protect their information from unauthorized use or disclosure for as long as it remains proprietary and refrain from using the information for any purpose other than that for which it was furnished. The contracting officer shall obtain copies of these agreements and ensure that they are properly executed.

(c) Contractors also obtain proprietary and source selection information by acquiring the services of marketing consultants which, if used in connection with an acquisition, may give the contractor an unfair competitive advantage. Contractors should make inquiries of marketing consultants to ensure that the marketing consultant has provided no unfair competitive advantage.

STEP 3: Setup LLM and Test Prompt

Next, we will setup the LLM, construct a prompt for this task, and test it on a small sample of passages from the FAR.

Since the FAR is a publicly available document, we will use a cloud LLM (i.e., gpt-4o-mini) for this task.

llm = LLM(model_url='openai://gpt-4o-mini', mute_stream=True, temperature=0)
/home/amaiya/projects/ghub/onprem/onprem/llm/base.py:217: UserWarning: The model you supplied is gpt-4o-mini, an external service (i.e., not on-premises). Use with caution, as your data and prompts will be sent externally.
  warnings.warn(f'The model you supplied is {self.model_name}, an external service (i.e., not on-premises). '+\
prompt = """
Given text from the Federal Acquisition Regulations (FAR), extract a list of explicitly cited statutes.
If there are no explicitly cited statutes,  return NA.  If there are, retun a list of cited statutes with each statute on a separate line.  
Do not include references to the FAR itself which are numbers with dots or dashes (e.g., 1.102-1, 3.104).

# Example 1:

<TEXT>
(2)A violation, as determined by the Secretary of Commerce, of any agreement of the group known as the "Coordination Committee" for purposes of the Export Administration Act of 1979 (50 U.S.C. App. 2401, et seq.) or any similar bilateral or multilateral export control agreement.

<STATUTES>
50 U.S.C. App. 2401 

# Example 2:

<TEXT>
9.400 Scope of subpart.
(a) This subpart-

(1) Prescribes policies and procedures governing the debarment and suspension of contractors by agencies for the causes given in 9.406-2 and 9.407-2;

(2) Provides for the listing of contractors debarred, suspended, proposed for debarment, and declared ineligible (see the definition of "ineligible" in 2.101); and

(3) Sets forth the consequences of this listing.

<STATUTES>

NA

# Example 3:

<TEXT>

--CONTENT--

<STATUTES>
"""
samples = [ '9.104-1.html', '9.104-2.html', '9.104-3.html', '9.104-4.html', '9.104-5.html', '9.104-6.html', '9.104-7.html']
results = []
for sample in samples:
    output = llm.prompt(prompt.replace('--CONTENT--', content[sample]))
    results.extend([(sample, o.strip()) for o in output.strip().split('\n') if o != 'NA'])
    #print(output)
df = pd.DataFrame(results, columns =['Section', 'Statute'])
df.head()
Section Statute
0 9.104-5.html Pub. L. 113-235
1 9.104-6.html 41 U.S.C. 2313(d)(3)
2 9.104-7.html Pub. L. 113-235

STEP 4: Run Analyses on FAR

results = []
for k in tqdm(content, total=len(content)):
    output = llm.prompt(prompt.replace('--CONTENT--', content[k]))
    results.extend([(k, o.strip()) for o in output.strip().split('\n') if o != 'NA'])
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 106/106 [00:55<00:00,  1.90it/s]
df = pd.DataFrame(results, columns =['FAR Section', 'Cited Statute'])
df = df.sort_values(by='FAR Section')
df.head(50)
FAR Section Cited Statute
46 9.103.html 15 U.S.C. 637
31 9.104-5.html Pub. L. 113-235
7 9.104-6.html 41 U.S.C. 2313
0 9.104-7.html Pub. L. 113-235
27 9.105-2.html Pub. L. 111-212
44 9.106-4.html 15 U.S.C. 637
28 9.107.html 41 U.S.C. chapter 85
4 9.108-1.html 6 U.S.C. 395(c)
3 9.108-1.html 6 U.S.C. 395(b)
25 9.108-2.html Pub. L. 110-161
22 9.109-1.html 22 U.S.C. 2593e
39 9.109-4.html 22 U.S.C. 2593e(d)
40 9.109-4.html 22 U.S.C. 2593e(e)
41 9.109-4.html 22 U.S.C. 2593e(g)(2)
42 9.109-4.html 22 U.S.C. 2593e(b)
38 9.109-4.html 22 U.S.C. 2593a
37 9.110-1.html 20 U.S.C. 1001
45 9.110-2.html 10 U.S.C. 983
33 9.110-3.html 10 U.S.C. 983
23 9.200.html 10 U.S.C. 3243
24 9.200.html 41 U.S.C. 3311
1 9.400.html 10 U.S.C. 983
2 9.400.html Federal Acquisition Supply Chain Security Act (FASCSA)
43 9.401.html 31 U.S.C. 6101, note
29 9.402.html 31 U.S.C. 6101, note
30 9.402.html Pub. L. 110-417
15 9.403.html 31 U.S.C. 3801-3812
14 9.403.html 19 U.S.C. 1337
13 9.403.html 50 U.S.C. App. 2401
26 9.405-1.html 10 U.S.C. 983
32 9.405.html 22 U.S.C. 2593e
21 9.406-2.html Title 18 of the United States Code
17 9.406-2.html 11 U.S.C. 362
20 9.406-2.html I.R.C. §6159
19 9.406-2.html I.R.C. §6320
16 9.406-2.html 31 U.S.C. 3729-3733
18 9.406-2.html I.R.C. §6212
12 9.406-4.html 41 U.S.C. chapter 81
10 9.407-2.html Title 18 of the United States Code
9 9.407-2.html Public Law 102-558
8 9.407-2.html 41 U.S.C. chapter 81
11 9.407-2.html 31 U.S.C. 3729-3733
5 9.500.html Pub.L.100-463
6 9.500.html 102 Stat.2270-47
35 9.701.html 15 U.S.C. 640
34 9.701.html 15 U.S.C. 638
36 9.701.html 50 U.S.C. App. 2158
unique_sections = set(df['FAR Section'].values.tolist())
print(f"Out of {len(filenames)} sections or subsections in PART 9 (contractor qualifications) of the FAR,  {len(unique_sections)} are driven by statuatory requirement.")
Out of 106 sections or subsections in PART 9 (contractor qualifications) of the FAR,  26 are driven by statuatory requirement.