Copy certain lines from one text file to another
Copy certain lines from one text file to another
I need to open a file, read a line and if that line respect some conditions write it to another file, the line that I'm reading is a normal ASCII string representig HEX values and I need to paste it into a new file as HEX values and not as ASCII string.
What I have is this:
private void Button_Click(object sender, RoutedEventArgs e)
byte arrayByte = 0x00 ;
var linesToKeep = File.ReadLines(fileName).Where(l => l.Contains(":10"));
foreach (string line in linesToKeep)
string partialA = line.Substring(9);
string partialB = partialA.Remove(partialA.Length - 2);
arrayByte = ToByteArray(partialB);
using (var stream = new FileStream(fileName+"_gugggu", FileMode.OpenOrCreate))
FileInfo file = null;
file = new FileInfo(fileName + "_gugggu");
stream.Position = file.Length;
stream.Write(arrayByte, 0, arrayByte.Length);
public static byte ToByteArray(String HexString)
int NumberChars = HexString.Length;
byte bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
bytes[i / 2] = Convert.ToByte(HexString.Substring(i, 2), 16);
return bytes;
This method is doing what I need but it takes ages to finish, the original files have roughly 70000 lines... Is there a better way to do that in order to increase speed?
EDIT
SOURCE FILE EXAMPLE:
:106AD00000000000000000000000000000000000B6
:106AE00000000000000000000000000000000000A6
:106AF0000000000000000000000000000000000096
:106B00000000000000000000000000000000000085
:106B10000000000000000000000000000000000075
:106B20000000000000000000000000000000000065
:106B30000000000000000000000000000000000055
:106B40000000000000000000000000000000000045
:106B50000000000000000000000000000000000035
:106B60000000000000000000000000000000000025
$begingroup$
I can give one quick tipp. Don't read the entire file with
ReadLines
. Instead read it line by line as you go like this. This way you only read it once and not twice like you're doing this now. You can use streams also for reading.$endgroup$
– t3chb0t
Sep 17 '18 at 14:55
ReadLines
$begingroup$
@t3chb0t:
ReadLines
is okay: IEnumerable<string> ReadLines (string path)
. It reads only the line you are currently enumerating.$endgroup$
– Olivier Jacot-Descombes
Sep 17 '18 at 16:03
ReadLines
IEnumerable<string> ReadLines (string path)
$begingroup$
Terminology: "as HEX values" still means as a string. Hexadecimal is a way of representing binary data in a human-readable way, with 2 characters per 8-bit byte. Describing an array of bytes as "hex values" is incorrect, unless those bytes represent ASCII codes, or each byte is limited to the range 0..15 (i.e. each byte is a hex digit, not the ASCII code for a hex digit). The term you're looking for is "binary string", or "binary data". And BTW, the function of your whole program is a reverse hexdump: turn hex into binary, like the
xxd
command.$endgroup$
– Peter Cordes
Sep 18 '18 at 3:05
xxd
2 Answers
2
It is OK to use File.ReadLines(...)
because it actually is shortcut to StreamReader.ReadLine()
(Not to be confused with File.ReadAllLines()
).
File.ReadLines(...)
StreamReader.ReadLine()
File.ReadAllLines()
I wonder why you reopen the same file for each line you want to save. I would do something like this:
byte arrayByte = 0x00 ;
using (var stream = new FileStream(fileName + "_gugggu", FileMode.Create))
{
foreach (string line in File.ReadLines(fileName).Where(l => l.Contains(":10")))
{
...
This may be the real bottleneck.
I think
string partialA = line.Substring(9);
string partialB = partialA.Remove(partialA.Length - 2);
is the same as:
string subString = line.Substring(9, line.Length - 11); // 11 = 9 + 2
All in all it could be changed to :
byte arrayByte = 0x00 ;
using (var stream = new FileStream(fileName + "_gugggu", FileMode.Create))
foreach (string line in File.ReadLines(fileName).Where(l => l.Contains(":10")))
string subString = line.Substring(9, line.Length - 11);
arrayByte = ToByteArray(subString);
stream.Write(arrayByte, 0, arrayByte.Length);
A few small things to add to Henrik Hansen's answer.
This is a bit odd:
byte arrayByte = 0x00 ;
There is no need to assign this a value at all when declaring it: not doing so would prevent its use before it is assigned a meaningful value, which is good. If you must use a meaningless initial value (for whatever reason), a null
would probably be better, as it will likely crash any code which tries to use it, rather than doing something meaningless/destructive.
null
Here, it would be better to simply declare arrayByte
when you use it inside the loop: it's always good to define things as close to where they are used as possible, because it makes it clearer what their purpose is and limits their technical scope to their conceptual scope (e.g. it wouldn't make any sense to access arrayByte
after the loop).
arrayByte
arrayByte
byte arrayByte = ToByteArray(subString);
As Henrik Handsen points out, there is no need to repeatedly open the output file and seek to the end. However, if ever you must seek to the end of a FileStream
, you can use the Seek(long, SeekOrigin)
. This expresses the intention much more clearly than the non-trivial FileInfo
route, and works with certain other types of stream as well.
FileStream
Seek(long, SeekOrigin)
FileInfo
It's unusual to see a method parameter or local variable in ProperCamelCase
:
ProperCamelCase
public static byte ToByteArray(String HexString)
{
int NumberChars = HexString.Length;
The private variable is completely hidden from the external API, so is relatively unimportant, but most guidelines suggest using lowerCamelCase
for local variables and method parameters.
lowerCamelCase
This method does a fair amount of work, and has nothing to do with the UI it is (I'm guessing from the name) embedded in. If it is taking a while, that will also provide a terrible user experience, as it will block the UI thread, stalling the application.
I would suggest, therefore, that it be pulled out into its own method (taking a filename
parameter, rather than pulling it from what I assume is a private field/property), and perhaps making it async
(maybe even with progress reporting), so that it can be conveniently run away from the UI thread.
filename
async
You have a couple of 'magic numbers' and 'magic strings' in your code, which are essentially meaningless to anyone unfamiliar with the domain. Why, for example, do you look for ":10"
, why do you skip the first 9 characters, and why do you remove the last 2? All of these warrant some form of documentation, ideally through the use of meaningful prior definitions (ideally with inline doc), or at least some comments explaining why those numbers are what they are.
":10"
Your code even contains "_gugggu"
twice, so it's an even more obvious candidate for being replaced with a well-named variable/constant/whatever. It is all-too-easy to change such magic constants in one place but to forget to change them in another.
"_gugggu"
Additionally, inline documentation (///
) on any public facing methods (ToByteArray
) is always nice, and a few more carefully placed empty lines and otherwise consistent whitespace would really help to break up the logic a bit and keep the code looking tidy (e.g. fileName+"_gugggu"
vs. fileName + "_gugggu"
///
ToByteArray
fileName+"_gugggu"
fileName + "_gugggu"
$begingroup$
Oh, I forgot I shouln't vote until I haven't read everything ;-P
$endgroup$
– t3chb0t
Sep 17 '18 at 18:18
$begingroup$
@t3chb0t don't worry, I wasn't watching that closely ;)
$endgroup$
– VisualMelon
Sep 17 '18 at 18:20
Thanks for contributing an answer to Code Review Stack Exchange!
But avoid …
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy
$begingroup$
@t3chb0t edited my question to include source file example.
$endgroup$
– FabioEnne
Sep 17 '18 at 14:53