= ['9.'] part_prefixes
Legal and Regulatory Analysis
In 2024, Senator Roger Wicker published a report on strengthening the defense industrial base and reforming defense acquisition. The latter involves cutting red tape by streamlining regulations.
In this tutorial, we will analyze parts of the Federal Acquisition Regulations (FAR) to identify which parts of it are driven by statuatory requirement.
For illustration purposes, we will focus our analysis on Part 9 of the FAR: contractor qualifications.
from onprem import LLM
from onprem.ingest import load_single_document, extract_files
from onprem import utils as U
from tqdm import tqdm
import pandas as pd
'display.max_colwidth', None) pd.set_option(
STEP 1: Download the Data
We will first download the HTML version of the FAR.
import zipfile
import tempfile
import os
# URL of the ZIP file
= "https://www.acquisition.gov/sites/default/files/current/far/zip/html/FARHTML.zip"
url
# Create a temporary directory
= tempfile.mkdtemp()
temp_dir = os.path.join(temp_dir, "FARHTML.zip")
zip_path
# Download the ZIP file
=True)
U.download(url, zip_path, verify
# Extract the ZIP file
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall(temp_dir)
print(f"\nFiles extracted to: {temp_dir}")
= [fname for fname in extract_files(temp_dir) if any(fname.lower().endswith('.html') and os.path.basename(fname).startswith(prefix) for prefix in part_prefixes)]
filenames print(f'Total files: {len(list(extract_files(temp_dir)))}')
print(f'Number of files of interest: {len(filenames)}')
print('Sample:')
for fname in filenames[:5]:
print(f'\t{fname}')
[██████████████████████████████████████████████████]
Files extracted to: /tmp/tmp3xa2rj1w
Total files: 3900
Number of files of interest: 106
Sample:
/tmp/tmp3xa2rj1w/dita_html/9.406-3.html
/tmp/tmp3xa2rj1w/dita_html/9.505-4.html
/tmp/tmp3xa2rj1w/dita_html/9.201.html
/tmp/tmp3xa2rj1w/dita_html/9.406-5.html
/tmp/tmp3xa2rj1w/dita_html/9.104-1.html
STEP 2: Text Extraction
We’ll extract text from each of the HTML files.
= {}
content for filename in tqdm(filenames, total=len(filenames)):
= load_single_document(filename)[0].page_content
text = text content[os.path.basename(filename)]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 106/106 [00:06<00:00, 16.76it/s]
print(content['9.505-4.html'])
9.505-4 Obtaining access to proprietary information.
(a) When a contractor requires proprietary information from others to perform a Government contract and can use the leverage of the contract to obtain it, the contractor may gain an unfair competitive advantage unless restrictions are imposed. These restrictions protect the information and encourage companies to provide it when necessary for contract performance. They are not intended to protect information-
(1) Furnished voluntarily without limitations on its use; or
(2) Available to the Government or contractor from other sources without restriction.
(b) A contractor that gains access to proprietary information of other companies in performing advisory and assistance services for the Government must agree with the other companies to protect their information from unauthorized use or disclosure for as long as it remains proprietary and refrain from using the information for any purpose other than that for which it was furnished. The contracting officer shall obtain copies of these agreements and ensure that they are properly executed.
(c) Contractors also obtain proprietary and source selection information by acquiring the services of marketing consultants which, if used in connection with an acquisition, may give the contractor an unfair competitive advantage. Contractors should make inquiries of marketing consultants to ensure that the marketing consultant has provided no unfair competitive advantage.
STEP 3: Setup LLM and Test Prompt
Next, we will setup the LLM, construct a prompt for this task, and test it on a small sample of passages from the FAR.
Since the FAR is a publicly available document, we will use a cloud LLM (i.e., gpt-4o-mini
) for this task.
= LLM(model_url='openai://gpt-4o-mini', mute_stream=True, temperature=0) llm
/home/amaiya/projects/ghub/onprem/onprem/llm/base.py:217: UserWarning: The model you supplied is gpt-4o-mini, an external service (i.e., not on-premises). Use with caution, as your data and prompts will be sent externally.
warnings.warn(f'The model you supplied is {self.model_name}, an external service (i.e., not on-premises). '+\
= """
prompt Given text from the Federal Acquisition Regulations (FAR), extract a list of explicitly cited statutes.
If there are no explicitly cited statutes, return NA. If there are, retun a list of cited statutes with each statute on a separate line.
Do not include references to the FAR itself which are numbers with dots or dashes (e.g., 1.102-1, 3.104).
# Example 1:
<TEXT>
(2)A violation, as determined by the Secretary of Commerce, of any agreement of the group known as the "Coordination Committee" for purposes of the Export Administration Act of 1979 (50 U.S.C. App. 2401, et seq.) or any similar bilateral or multilateral export control agreement.
<STATUTES>
50 U.S.C. App. 2401
# Example 2:
<TEXT>
9.400 Scope of subpart.
(a) This subpart-
(1) Prescribes policies and procedures governing the debarment and suspension of contractors by agencies for the causes given in 9.406-2 and 9.407-2;
(2) Provides for the listing of contractors debarred, suspended, proposed for debarment, and declared ineligible (see the definition of "ineligible" in 2.101); and
(3) Sets forth the consequences of this listing.
<STATUTES>
NA
# Example 3:
<TEXT>
--CONTENT--
<STATUTES>
"""
= [ '9.104-1.html', '9.104-2.html', '9.104-3.html', '9.104-4.html', '9.104-5.html', '9.104-6.html', '9.104-7.html']
samples = []
results for sample in samples:
= llm.prompt(prompt.replace('--CONTENT--', content[sample]))
output for o in output.strip().split('\n') if o != 'NA'])
results.extend([(sample, o.strip()) #print(output)
= pd.DataFrame(results, columns =['Section', 'Statute'])
df df.head()
Section | Statute | |
---|---|---|
0 | 9.104-5.html | Pub. L. 113-235 |
1 | 9.104-6.html | 41 U.S.C. 2313(d)(3) |
2 | 9.104-7.html | Pub. L. 113-235 |
STEP 4: Run Analyses on FAR
= []
results for k in tqdm(content, total=len(content)):
= llm.prompt(prompt.replace('--CONTENT--', content[k]))
output for o in output.strip().split('\n') if o != 'NA']) results.extend([(k, o.strip())
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 106/106 [00:55<00:00, 1.90it/s]
= pd.DataFrame(results, columns =['FAR Section', 'Cited Statute'])
df = df.sort_values(by='FAR Section')
df 50) df.head(
FAR Section | Cited Statute | |
---|---|---|
46 | 9.103.html | 15 U.S.C. 637 |
31 | 9.104-5.html | Pub. L. 113-235 |
7 | 9.104-6.html | 41 U.S.C. 2313 |
0 | 9.104-7.html | Pub. L. 113-235 |
27 | 9.105-2.html | Pub. L. 111-212 |
44 | 9.106-4.html | 15 U.S.C. 637 |
28 | 9.107.html | 41 U.S.C. chapter 85 |
4 | 9.108-1.html | 6 U.S.C. 395(c) |
3 | 9.108-1.html | 6 U.S.C. 395(b) |
25 | 9.108-2.html | Pub. L. 110-161 |
22 | 9.109-1.html | 22 U.S.C. 2593e |
39 | 9.109-4.html | 22 U.S.C. 2593e(d) |
40 | 9.109-4.html | 22 U.S.C. 2593e(e) |
41 | 9.109-4.html | 22 U.S.C. 2593e(g)(2) |
42 | 9.109-4.html | 22 U.S.C. 2593e(b) |
38 | 9.109-4.html | 22 U.S.C. 2593a |
37 | 9.110-1.html | 20 U.S.C. 1001 |
45 | 9.110-2.html | 10 U.S.C. 983 |
33 | 9.110-3.html | 10 U.S.C. 983 |
23 | 9.200.html | 10 U.S.C. 3243 |
24 | 9.200.html | 41 U.S.C. 3311 |
1 | 9.400.html | 10 U.S.C. 983 |
2 | 9.400.html | Federal Acquisition Supply Chain Security Act (FASCSA) |
43 | 9.401.html | 31 U.S.C. 6101, note |
29 | 9.402.html | 31 U.S.C. 6101, note |
30 | 9.402.html | Pub. L. 110-417 |
15 | 9.403.html | 31 U.S.C. 3801-3812 |
14 | 9.403.html | 19 U.S.C. 1337 |
13 | 9.403.html | 50 U.S.C. App. 2401 |
26 | 9.405-1.html | 10 U.S.C. 983 |
32 | 9.405.html | 22 U.S.C. 2593e |
21 | 9.406-2.html | Title 18 of the United States Code |
17 | 9.406-2.html | 11 U.S.C. 362 |
20 | 9.406-2.html | I.R.C. §6159 |
19 | 9.406-2.html | I.R.C. §6320 |
16 | 9.406-2.html | 31 U.S.C. 3729-3733 |
18 | 9.406-2.html | I.R.C. §6212 |
12 | 9.406-4.html | 41 U.S.C. chapter 81 |
10 | 9.407-2.html | Title 18 of the United States Code |
9 | 9.407-2.html | Public Law 102-558 |
8 | 9.407-2.html | 41 U.S.C. chapter 81 |
11 | 9.407-2.html | 31 U.S.C. 3729-3733 |
5 | 9.500.html | Pub.L.100-463 |
6 | 9.500.html | 102 Stat.2270-47 |
35 | 9.701.html | 15 U.S.C. 640 |
34 | 9.701.html | 15 U.S.C. 638 |
36 | 9.701.html | 50 U.S.C. App. 2158 |
= set(df['FAR Section'].values.tolist())
unique_sections print(f"Out of {len(filenames)} sections or subsections in PART 9 (contractor qualifications) of the FAR, {len(unique_sections)} are driven by statuatory requirement.")
Out of 106 sections or subsections in PART 9 (contractor qualifications) of the FAR, 26 are driven by statuatory requirement.