html scraping in either batch or powershell [closed]
I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?
html powershell batch-file web-scraping
closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 '18 at 12:56
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?
html powershell batch-file web-scraping
closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 '18 at 12:56
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?
html powershell batch-file web-scraping
I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?
html powershell batch-file web-scraping
html powershell batch-file web-scraping
edited Nov 12 '18 at 13:03
mklement0
130k20243280
130k20243280
asked Nov 11 '18 at 19:13
LandonBBLandonBB
52
52
closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 '18 at 12:56
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 '18 at 12:56
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach(
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
)
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach(
if ($i++ % 2 -eq 0) $co = [pscustomobject] @ username = $_; password = ''
else $co.password = $_; $co
)
# Create custom objects with the same structure for the users.
$users = $users.ForEach(
[pscustomobject] @ username = $_; password = ''
)
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
add a comment |
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>')
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success)
$admins += [PSCustomObject]@
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Administrators' text block."
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>')
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success)
$users += [PSCustomObject]@
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Users' text block."
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
add a comment |
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where(
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
)
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
if ($CIS_Item.StartsWith('<b>'))
$UserType = 'User'
continue
[PSCustomObject]@
Name = $CIS_Item.Trim()
UserType = $UserType
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach(
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
)
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach(
if ($i++ % 2 -eq 0) $co = [pscustomobject] @ username = $_; password = ''
else $co.password = $_; $co
)
# Create custom objects with the same structure for the users.
$users = $users.ForEach(
[pscustomobject] @ username = $_; password = ''
)
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
add a comment |
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach(
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
)
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach(
if ($i++ % 2 -eq 0) $co = [pscustomobject] @ username = $_; password = ''
else $co.password = $_; $co
)
# Create custom objects with the same structure for the users.
$users = $users.ForEach(
[pscustomobject] @ username = $_; password = ''
)
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
add a comment |
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach(
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
)
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach(
if ($i++ % 2 -eq 0) $co = [pscustomobject] @ username = $_; password = ''
else $co.password = $_; $co
)
# Create custom objects with the same structure for the users.
$users = $users.ForEach(
[pscustomobject] @ username = $_; password = ''
)
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach(
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
)
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach(
if ($i++ % 2 -eq 0) $co = [pscustomobject] @ username = $_; password = ''
else $co.password = $_; $co
)
# Create custom objects with the same structure for the users.
$users = $users.ForEach(
[pscustomobject] @ username = $_; password = ''
)
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
edited Nov 12 '18 at 12:49
answered Nov 11 '18 at 21:50
mklement0mklement0
130k20243280
130k20243280
add a comment |
add a comment |
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>')
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success)
$admins += [PSCustomObject]@
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Administrators' text block."
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>')
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success)
$users += [PSCustomObject]@
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Users' text block."
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
add a comment |
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>')
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success)
$admins += [PSCustomObject]@
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Administrators' text block."
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>')
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success)
$users += [PSCustomObject]@
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Users' text block."
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
add a comment |
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>')
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success)
$admins += [PSCustomObject]@
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Administrators' text block."
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>')
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success)
$users += [PSCustomObject]@
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Users' text block."
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>')
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success)
$admins += [PSCustomObject]@
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Administrators' text block."
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>')
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success)
$users += [PSCustomObject]@
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
$match = $match.NextMatch()
else
Write-Warning "Could not find 'Authorized Users' text block."
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
answered Nov 11 '18 at 21:12
TheoTheo
4,8212520
4,8212520
add a comment |
add a comment |
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where(
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
)
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
if ($CIS_Item.StartsWith('<b>'))
$UserType = 'User'
continue
[PSCustomObject]@
Name = $CIS_Item.Trim()
UserType = $UserType
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
add a comment |
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where(
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
)
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
if ($CIS_Item.StartsWith('<b>'))
$UserType = 'User'
continue
[PSCustomObject]@
Name = $CIS_Item.Trim()
UserType = $UserType
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
add a comment |
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where(
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
)
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
if ($CIS_Item.StartsWith('<b>'))
$UserType = 'User'
continue
[PSCustomObject]@
Name = $CIS_Item.Trim()
UserType = $UserType
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where(
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
)
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
if ($CIS_Item.StartsWith('<b>'))
$UserType = 'User'
continue
[PSCustomObject]@
Name = $CIS_Item.Trim()
UserType = $UserType
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
answered Nov 11 '18 at 20:04
Lee_DaileyLee_Dailey
2,004189
2,004189
add a comment |
add a comment |