Using BeautifulSoup To Extract A Table In Python 3
Solution 1:
So you already have this:
datasets = [
(('Tests', '103'), ('Failures', '24'), ('Success Rate', '76.70%'), ('Average Time', '71 ms'), ('Min Time', '0 ms'), ('Max Time', '829 ms')),
(('Tests', '109'), ('Failures', '35'), ('Success Rate', '82.01%'), ('Average Time', '12 ms'), ('Min Time', '2 ms'), ('Max Time', '923 ms'))
]
Here's how you can transform it. Assuming all rows are the same, you can extract headers from the first row:
headers_row = [hdr for hdr, data in datasets[0]]
Now, extract the second field of each tuple like ('Tests', '103')
in each row:
processed_rows = [
[data for hdr, data in row]
for row in datasets
]
# [['103', '24', '76.70%', '71 ms', '0 ms', '829 ms'], ['109', '35', '82.01%', '12 ms', '2 ms', '923 ms']]
Now you have the header row and a list of processed_rows
. You can write them to a CSV file with the standard csv
module.
A better solution may be to keep your original format and use csv.DictWriter
.
Extract the headers into
headers_row
, as shown above.Write the data:
import csv with open('data.csv', 'w', newline='') as csvfile: writer = csv.DictWriter(csvfile, fieldnames= headers_row) writer.writeheader() for row in datasets: # your original data writer.writerow(dict(row))
Here dict(datasets[0])
, for example, is:
{'Tests': '103', 'Failures': '24', 'Success Rate': '76.70%', 'Average Time': '71 ms', 'Min Time': '0 ms', 'Max Time': '829 ms'}
Solution 2:
At the end, just convert your zip iterator to a list:
for row in table.find_all("tr")[1:]:
dataset = zip(headings, (td.get_text() for td in row.find_all("td")))
datasets.append(list(dataset)) # process iterator to list
print(datasets)
Final Output:
[[('Tests', '103'),
('Failures', '24'),
('Success Rate', '76.70%'),
('Average Time', '71 ms'),
('Min Time', '0 ms'),
('Max Time', '829 ms')],
[('Tests', '109'),
('Failures', '35'),
('Success Rate', '82.01%'),
('Average Time', '12 ms'),
('Min Time', '2 ms'),
('Max Time', '923 ms')]]
If you want to convert the dataset to a csv string, use this code:
# convert to csv string
hdrline = ','.join(e[0] for e in datasets[0]) + "\n"
data = ""
for rw in datasets:
data += ','.join([e[1] for e in rw]) + "\n"
csvstr = hdrline + data
print(csvstr)
Output:
Tests,Failures,Success Rate,Average Time,Min Time,Max Time
103,24,76.70%,71 ms,0 ms,829 ms
109,35,82.01%,12 ms,2 ms,923 ms
Solution 3:
If you are using the standard csv
module, then you don't need to associate values with their labels
You can do the following, assuming you have a csvwriter
which can be obtained via
https://docs.python.org/3.8/library/csv.html#csv.writer
import csv
...
with open('file.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile) # You may add options here to format your csv file as needed
headings = [th.get_text() for th in table.find("tr").find_all("th")]
csvwriter.writerow(headings)
for row in table.find_all("tr")[1:]:
data = (td.get_text() for td in row.find_all("td"))
csvwriter.writerow(data)
Post a Comment for "Using BeautifulSoup To Extract A Table In Python 3"