UKC

Python regex help

New Topic
This topic has been archived, and won't accept reply postings.
 Climber_Bill 14 Mar 2016
I need some help with a python regex problem in a script.

The script runs through log files and searches for various matches and then adds that match to a list. The list is then appended to a text file.

I need only the first match and for the script to ignore the rest.

for file in os.listdir('.'):

detailsList = []

if file.endswith(".log"):

logfile = open(file, 'r')
for line in logfile:
orderItem = re.compile("[][A-Z]{2}[0-9]{3,7}[.]")
orderItemMatch = orderItem.search(line)
if orderItemMatch:
orderItemMatch = orderItemMatch.group()
orderItemMatch = orderItemMatch.translate(None, ".")
#print orderItemMatch
detailsList.append(orderItemMatch)

The problem is there are multiple matches and they all are added to the list and then to the text file.

Is there a way of limiting the search to only the first match or do something else suitable.

Thanks for any help.

TJB.
KevinD 14 Mar 2016
In reply to Climber_Bill:

My python aint the strongest but looking at that seems like its not a regex issue as such but just how you are accessing it.
Since you go through each line in turn its gets the chance to do multiple matches. I would either look at searching entire file at once (depending on size) or have it check against your detailslist before appending it.
 SenzuBean 14 Mar 2016
In reply to Climber_Bill:
Why does it have to be python? Are you getting us to do your homework...?

The problem you are working on was solved 40 years ago with a tool called grep. It'll be much quicker if you just use that to search your log files, than try to write your own version.
Post edited at 15:47
OP Climber_Bill 14 Mar 2016
In reply to KevinD:

Doing a check against the list and only adding if it doesn't already exist is a good idea. I'll try that.

I did think about searching the entire file but thought it would still return all matches.

Thanks.

TJB.
OP Climber_Bill 14 Mar 2016
In reply to SenzuBean:

I have done it in bash using grep, which obviously does allow for the search to stop after the first match, but want to improve my knowledge of python and regex, that's all.

Also, the files are on a Windows server.

Homework? That's a good one. Unfortunately, I'm long past the days of homework.

Cheers,

TJB.
 SenzuBean 14 Mar 2016
In reply to Climber_Bill:

> I have done it in bash using grep, which obviously does allow for the search to stop after the first match, but want to improve my knowledge of python and regex, that's all.

> Also, the files are on a Windows server.

> Homework? That's a good one. Unfortunately, I'm long past the days of homework.

> Cheers,

> TJB.

Ah okay - well that wasn't clear in the OP. And Windows has grep tools too - so you can do that.

If you want to improve your programming ability, start writing modular code, and write your own unit tests, component tests, system-level tests to test that everything works. There is almost no better way to improve.

Regarding your specific question - there is a 'break' statement you can use to exit the loop early, and then you'll only have a single item in the list. There are probably clearer ways than using for, if and break statements however (I haven't used python for years, and never properly - so can't help with specifics, but in general - imperative style programming is to be avoided in favor of declarative style.)

Age is no barrier to homework.
 remus Global Crag Moderator 14 Mar 2016
In reply to Climber_Bill:

https://gist.github.com/anonymous/06299d9943cc9cc8262d

Your question really just boils down to how you want to use the python standard library. The docs are great: https://docs.python.org/2/library/re.html

I included a couple of other pointers to clean up the code.
OP Climber_Bill 15 Mar 2016
In reply to remus:

Thanks Remus, that's really helpful.

Cheers,

TJB.

New Topic
This topic has been archived, and won't accept reply postings.
Loading Notifications...