Copy certain lines from one text file to another

Copy certain lines from one text file to another



I need to open a file, read a line and if that line respect some conditions write it to another file, the line that I'm reading is a normal ASCII string representig HEX values and I need to paste it into a new file as HEX values and not as ASCII string.



What I have is this:


private void Button_Click(object sender, RoutedEventArgs e)

byte arrayByte = 0x00 ;
var linesToKeep = File.ReadLines(fileName).Where(l => l.Contains(":10"));
foreach (string line in linesToKeep)


string partialA = line.Substring(9);
string partialB = partialA.Remove(partialA.Length - 2);
arrayByte = ToByteArray(partialB);
using (var stream = new FileStream(fileName+"_gugggu", FileMode.OpenOrCreate))

FileInfo file = null;
file = new FileInfo(fileName + "_gugggu");
stream.Position = file.Length;
stream.Write(arrayByte, 0, arrayByte.Length);



public static byte ToByteArray(String HexString)

int NumberChars = HexString.Length;
byte bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)

bytes[i / 2] = Convert.ToByte(HexString.Substring(i, 2), 16);

return bytes;



This method is doing what I need but it takes ages to finish, the original files have roughly 70000 lines... Is there a better way to do that in order to increase speed?



EDIT
SOURCE FILE EXAMPLE:


:106AD00000000000000000000000000000000000B6
:106AE00000000000000000000000000000000000A6
:106AF0000000000000000000000000000000000096
:106B00000000000000000000000000000000000085
:106B10000000000000000000000000000000000075
:106B20000000000000000000000000000000000065
:106B30000000000000000000000000000000000055
:106B40000000000000000000000000000000000045
:106B50000000000000000000000000000000000035
:106B60000000000000000000000000000000000025





$begingroup$
@t3chb0t edited my question to include source file example.
$endgroup$
– FabioEnne
Sep 17 '18 at 14:53





$begingroup$
I can give one quick tipp. Don't read the entire file with ReadLines. Instead read it line by line as you go like this. This way you only read it once and not twice like you're doing this now. You can use streams also for reading.
$endgroup$
– t3chb0t
Sep 17 '18 at 14:55



ReadLines





$begingroup$
@t3chb0t: ReadLines is okay: IEnumerable<string> ReadLines (string path). It reads only the line you are currently enumerating.
$endgroup$
– Olivier Jacot-Descombes
Sep 17 '18 at 16:03


ReadLines


IEnumerable<string> ReadLines (string path)





$begingroup$
Terminology: "as HEX values" still means as a string. Hexadecimal is a way of representing binary data in a human-readable way, with 2 characters per 8-bit byte. Describing an array of bytes as "hex values" is incorrect, unless those bytes represent ASCII codes, or each byte is limited to the range 0..15 (i.e. each byte is a hex digit, not the ASCII code for a hex digit). The term you're looking for is "binary string", or "binary data". And BTW, the function of your whole program is a reverse hexdump: turn hex into binary, like the xxd command.
$endgroup$
– Peter Cordes
Sep 18 '18 at 3:05


xxd




2 Answers
2



It is OK to use File.ReadLines(...) because it actually is shortcut to StreamReader.ReadLine() (Not to be confused with File.ReadAllLines()).


File.ReadLines(...)


StreamReader.ReadLine()


File.ReadAllLines()



I wonder why you reopen the same file for each line you want to save. I would do something like this:


byte arrayByte = 0x00 ;

using (var stream = new FileStream(fileName + "_gugggu", FileMode.Create))
{
foreach (string line in File.ReadLines(fileName).Where(l => l.Contains(":10")))
{
...



This may be the real bottleneck.



I think


string partialA = line.Substring(9);
string partialB = partialA.Remove(partialA.Length - 2);



is the same as:


string subString = line.Substring(9, line.Length - 11); // 11 = 9 + 2



All in all it could be changed to :


byte arrayByte = 0x00 ;

using (var stream = new FileStream(fileName + "_gugggu", FileMode.Create))

foreach (string line in File.ReadLines(fileName).Where(l => l.Contains(":10")))

string subString = line.Substring(9, line.Length - 11);
arrayByte = ToByteArray(subString);
stream.Write(arrayByte, 0, arrayByte.Length);




A few small things to add to Henrik Hansen's answer.



This is a bit odd:


byte arrayByte = 0x00 ;



There is no need to assign this a value at all when declaring it: not doing so would prevent its use before it is assigned a meaningful value, which is good. If you must use a meaningless initial value (for whatever reason), a null would probably be better, as it will likely crash any code which tries to use it, rather than doing something meaningless/destructive.


null



Here, it would be better to simply declare arrayByte when you use it inside the loop: it's always good to define things as close to where they are used as possible, because it makes it clearer what their purpose is and limits their technical scope to their conceptual scope (e.g. it wouldn't make any sense to access arrayByte after the loop).


arrayByte


arrayByte


byte arrayByte = ToByteArray(subString);



As Henrik Handsen points out, there is no need to repeatedly open the output file and seek to the end. However, if ever you must seek to the end of a FileStream, you can use the Seek(long, SeekOrigin). This expresses the intention much more clearly than the non-trivial FileInfo route, and works with certain other types of stream as well.


FileStream


Seek(long, SeekOrigin)


FileInfo



It's unusual to see a method parameter or local variable in ProperCamelCase:


ProperCamelCase


public static byte ToByteArray(String HexString)
{
int NumberChars = HexString.Length;



The private variable is completely hidden from the external API, so is relatively unimportant, but most guidelines suggest using lowerCamelCase for local variables and method parameters.


lowerCamelCase



This method does a fair amount of work, and has nothing to do with the UI it is (I'm guessing from the name) embedded in. If it is taking a while, that will also provide a terrible user experience, as it will block the UI thread, stalling the application.



I would suggest, therefore, that it be pulled out into its own method (taking a filename parameter, rather than pulling it from what I assume is a private field/property), and perhaps making it async (maybe even with progress reporting), so that it can be conveniently run away from the UI thread.


filename


async



You have a couple of 'magic numbers' and 'magic strings' in your code, which are essentially meaningless to anyone unfamiliar with the domain. Why, for example, do you look for ":10", why do you skip the first 9 characters, and why do you remove the last 2? All of these warrant some form of documentation, ideally through the use of meaningful prior definitions (ideally with inline doc), or at least some comments explaining why those numbers are what they are.


":10"



Your code even contains "_gugggu" twice, so it's an even more obvious candidate for being replaced with a well-named variable/constant/whatever. It is all-too-easy to change such magic constants in one place but to forget to change them in another.


"_gugggu"



Additionally, inline documentation (///) on any public facing methods (ToByteArray) is always nice, and a few more carefully placed empty lines and otherwise consistent whitespace would really help to break up the logic a bit and keep the code looking tidy (e.g. fileName+"_gugggu" vs. fileName + "_gugggu"


///


ToByteArray


fileName+"_gugggu"


fileName + "_gugggu"





$begingroup$
Oh, I forgot I shouln't vote until I haven't read everything ;-P
$endgroup$
– t3chb0t
Sep 17 '18 at 18:18





$begingroup$
@t3chb0t don't worry, I wasn't watching that closely ;)
$endgroup$
– VisualMelon
Sep 17 '18 at 18:20



Thanks for contributing an answer to Code Review Stack Exchange!



But avoid



Use MathJax to format equations. MathJax reference.



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)