Discussion:
RegEx question: how to get only digits (numbers only)
Emile Schwarz
2003-12-25 13:22:40 UTC
Permalink
Hi all,

This returns every digits that are neightbors:
rg.SearchPattern = "[0-9].*"

but how can I extract the digits from a multi-characters strings ?

ex: get "12" from "a12", "1a2", "12a" and so on.


This is for a typing error free: it happens sometimes that I (the user) type
some characters inside a number; that's the reason why I used "a12" (and other
variations) as the example.

The following (a and b) lines seems to give the same result:
a. rg.SearchPattern = "[\d].*"
b. rg.SearchPattern = "[0-9].*"
returns "12" for "a12" and "12a", but "1a2" fails.

The GetNumber function works fine (in another project *) because all digits are
together. But I need to get extract all digits from a string; this string
usually must have only 2 two 4 digits, but a typical user sometimes can make
typos (add a character by error).


TIA,

Emile

The code above comes from the following function:

Function GetNumber(source As String) As String
//
// Name: GetNumber
// Inputs: source As String
// Outputs: String
// Syntax: String = GetNumber(source)
// Created: 19-12-2003; 13:20
// Creator: Emile Schwarz (emile.schwarz-***@public.gmane.org)
//
// Based on the RegEx Class example
//
Dim rg As RegEx
Dim myMatch As RegExMatch

rg = New RegEx
rg.SearchPattern = "[0-9].*"
// "[\d].*" // Returns all digits
// "[0-9].*" // Returns all digits too !


myMatch = rg.search(source)
if myMatch = Nil Then
Return ""

Else
Return myMatch.SubExpressionString(0)

End if

Exception err As RegExException
MsgBox err.message
End Function

Nota: to change this function from GetNumber to GetText, replace the
.SearchPattern to:

// Returns all the alphabetic characters from the start of the string
rg.SearchPattern = ".*[a-z]"



* that project scan and process strong typed text informations where the atom(s)
can are alphabetic information sometimes followed by a numeric number (no mix-up
between alphabetic and numeric characters).



- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Jerry Hamlet
2003-12-26 07:39:50 UTC
Permalink
The easy solution would be to search for everything that IS NOT a number.
Replacing those with nothing. The string left over would be your number.

Rgx.SearchPattern = "\D+"
Rgx.ReplacementPattern = ""
Rgx.Options.ReplaceAllMatches = true

Return Rgx.Replace( source )


:-j
Post by Emile Schwarz
Hi all,
rg.SearchPattern = "[0-9].*"
but how can I extract the digits from a multi-characters strings ?
ex: get "12" from "a12", "1a2", "12a" and so on.
This is for a typing error free: it happens sometimes that I (the user) type
some characters inside a number; that's the reason why I used "a12" (and other
variations) as the example.
a. rg.SearchPattern = "[\d].*"
b. rg.SearchPattern = "[0-9].*"
returns "12" for "a12" and "12a", but "1a2" fails.
The GetNumber function works fine (in another project *) because all digits are
together. But I need to get extract all digits from a string; this string
usually must have only 2 two 4 digits, but a typical user sometimes can make
typos (add a character by error).
TIA,
Emile
Function GetNumber(source As String) As String
//
// Name: GetNumber
// Inputs: source As String
// Outputs: String
// Syntax: String = GetNumber(source)
// Created: 19-12-2003; 13:20
//
// Based on the RegEx Class example
//
Dim rg As RegEx
Dim myMatch As RegExMatch
rg = New RegEx
rg.SearchPattern = "[0-9].*"
// "[\d].*" // Returns all digits
// "[0-9].*" // Returns all digits too !
myMatch = rg.search(source)
if myMatch = Nil Then
Return ""
Else
Return myMatch.SubExpressionString(0)
End if
Exception err As RegExException
MsgBox err.message
End Function
Nota: to change this function from GetNumber to GetText, replace the
// Returns all the alphabetic characters from the start of the string
rg.SearchPattern = ".*[a-z]"
* that project scan and process strong typed text informations where the atom(s)
can are alphabetic information sometimes followed by a numeric number (no mix-up
between alphabetic and numeric characters).
- - -
<http://support.realsoftware.com/listmanager/>
<http://support.realsoftware.com/listarchives/lists.html>
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Norman Palardy
2003-12-26 18:57:28 UTC
Permalink
The best answer is NOT to try and fix the error after it's been made,
but to NOT let the user make the error at all.
This is where a masked editfield comes in handy. You can simplky reject
anything that does not match the mask you are expecting and you do not
have to write a lot of "fix up" code to try and figure out what the
user might have really meant
Post by Emile Schwarz
Hi all,
rg.SearchPattern = "[0-9].*"
but how can I extract the digits from a multi-characters strings ?
ex: get "12" from "a12", "1a2", "12a" and so on.
This is for a typing error free: it happens sometimes that I (the
user) type some characters inside a number; that's the reason why I
used "a12" (and other variations) as the example.
a. rg.SearchPattern = "[\d].*"
b. rg.SearchPattern = "[0-9].*"
returns "12" for "a12" and "12a", but "1a2" fails.
The GetNumber function works fine (in another project *) because all
digits are together. But I need to get extract all digits from a
string; this string usually must have only 2 two 4 digits, but a
typical user sometimes can make typos (add a character by error).
TIA,
Emile
Function GetNumber(source As String) As String
//
// Name: GetNumber
// Inputs: source As String
// Outputs: String
// Syntax: String = GetNumber(source)
// Created: 19-12-2003; 13:20
//
// Based on the RegEx Class example
//
Dim rg As RegEx
Dim myMatch As RegExMatch
rg = New RegEx
rg.SearchPattern = "[0-9].*"
// "[\d].*" // Returns all digits
// "[0-9].*" // Returns all digits too !
myMatch = rg.search(source)
if myMatch = Nil Then
Return ""
Else
Return myMatch.SubExpressionString(0)
End if
Exception err As RegExException
MsgBox err.message
End Function
Nota: to change this function from GetNumber to GetText, replace the
// Returns all the alphabetic characters from the start of the string
rg.SearchPattern = ".*[a-z]"
* that project scan and process strong typed text informations where
the atom(s) can are alphabetic information sometimes followed by a
numeric number (no mix-up between alphabetic and numeric characters).
- - -
<http://support.realsoftware.com/listmanager/>
<http://support.realsoftware.com/listarchives/lists.html>
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Emile Schwarz
2003-12-29 07:55:24 UTC
Permalink
Hi Norman,

thank you for your answer.

I will try that (if fact, I have to get the technology, then try it ;)... )

Cheers,

Emil


PS: amazing how difficult is the road to make things simple...
REALbasic-NUG Digest #9799 - Sunday, December 28, 2003
Subject: Re: RegEx question: how to get only digits (numbers only)
Date: Fri, 26 Dec 2003 11:57:28 -0700
The best answer is NOT to try and fix the error after it's been made,
but to NOT let the user make the error at all.
This is where a masked editfield comes in handy. You can simplky reject
anything that does not match the mask you are expecting and you do not
have to write a lot of "fix up" code to try and figure out what the
user might have really meant
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Norman Palardy
2003-12-30 16:33:53 UTC
Permalink
I have a masked edit field on my web site (members.shaw.ca/palardyn)

I've just been working with it again and realize it needs some fixing,
but the source code is all there for you to have at it.
The editing keys & such need implementing which is pretty easy.

Unlike the masks on the new alpha's this one lets you have multiple
masks per field and it will match one of them.
It will reject anything that is not in the mask.
Post by Emile Schwarz
Hi Norman,
thank you for your answer.
I will try that (if fact, I have to get the technology, then try it ;)... )
Cheers,
Emil
PS: amazing how difficult is the road to make things simple...
REALbasic-NUG Digest #9799 - Sunday, December 28, 2003
Subject: Re: RegEx question: how to get only digits (numbers only)
Date: Fri, 26 Dec 2003 11:57:28 -0700
The best answer is NOT to try and fix the error after it's been made,
but to NOT let the user make the error at all.
This is where a masked editfield comes in handy. You can simplky
reject anything that does not match the mask you are expecting and
you do not have to write a lot of "fix up" code to try and figure out
what the user might have really meant
- - -
<http://support.realsoftware.com/listmanager/>
<http://support.realsoftware.com/listarchives/lists.html>
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Emile Schwarz
2004-01-01 11:46:38 UTC
Permalink
Hi,

a. the history (for the record)
--------------
What is the current goal ? The user enters a date using two formats (a CheckBox
gives the syntax). The field separator can be either a space, '-' or '/', the
same in each date (no mixing allowed).

The current code check for day (< 32), month (< 13) and year (> 1900) value
range and issue an alert otherwise; a year value below 1900 is accepted, other
values are rejected.

The first version without any error checking was below 15 lines, but I'm
actually between 50 and 100 lines (Dim, Comments and empty lines included).



b. today:
---------

I checked EditField.Mask and it was good. *

BUT, in the point where I am in the project, I reject that solution. Why ? Here
I am:

Pro: avoid bad typing.
Cons: only one 'field' separator is available. (see field separator below)


The previous 'solution' was to use InStr.

Pro: I can accept some separators: ' ' (space), '-', '/'...
Cons: I cannot mix two separators (see field separator below)
I can only prevent some errors: month and day field value range
I cannot prevent typing errors (adding a character inside the string)


Field separator
The entry is a date string. Two formats are allowed: DD-MM-YYYY and MM-DD-YYYY

The used field separator - appears two times - must be unique (the same
separator used twice; no mixing field separator): DD-MM/YYYY is illegal.


The main goal is to stay simple, but it isn't easy to achieve. Now I have to add
a "numbers only (and field separators)" 'filter' in EditField.KeyDown, then stay
with my 'current' field dispatcher.
I could eventually explore a "multi-field separator inside the typed date":
DD/MM-YYYY for example.
I can also replace the CheckBox (two choices only) to a PopupMenu (Three or more
field separator choices)...

Thanks,

Emile


* This sentence reminds me something.
REALbasic-NUG Digest #9806 - Tuesday, December 30, 2003
Subject: Re: RegEx question: how to get only digits (numbers only)
Date: Mon, 29 Dec 2003 08:55:24 +0100
Hi Norman,
thank you for your answer.
I will try that (if fact, I have to get the technology, then try it ;)... )
Cheers,
Emil
PS: amazing how difficult is the road to make things simple...
REALbasic-NUG Digest #9799 - Sunday, December 28, 2003
Subject: Re: RegEx question: how to get only digits (numbers only)
Date: Fri, 26 Dec 2003 11:57:28 -0700
The best answer is NOT to try and fix the error after it's been made,
but to NOT let the user make the error at all.
This is where a masked editfield comes in handy. You can simplky reject
anything that does not match the mask you are expecting and you do not
have to write a lot of "fix up" code to try and figure out what the
user might have really meant
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Norman Palardy
2004-01-01 19:19:54 UTC
Permalink
Post by Emile Schwarz
Hi,
a. the history (for the record)
--------------
What is the current goal ? The user enters a date using two formats (a
CheckBox gives the syntax). The field separator can be either a space,
'-' or '/', the same in each date (no mixing allowed).
The current code check for day (< 32), month (< 13) and year (> 1900)
value range and issue an alert otherwise; a year value below 1900 is
accepted, other values are rejected.
The first version without any error checking was below 15 lines, but
I'm actually between 50 and 100 lines (Dim, Comments and empty lines
included).
---------
I checked EditField.Mask and it was good. *
BUT, in the point where I am in the project, I reject that solution.
Pro: avoid bad typing.
Cons: only one 'field' separator is available. (see field separator below)
This IS an issue with the existing MASK for editfields.
They only allow a single mask, which makes them less useful than they
could be if several masks were allowable.
Post by Emile Schwarz
The previous 'solution' was to use InStr.
Pro: I can accept some separators: ' ' (space), '-', '/'...
Cons: I cannot mix two separators (see field separator below)
I can only prevent some errors: month and day field value range
I cannot prevent typing errors (adding a character inside the string)
Field separator
The entry is a date string. Two formats are allowed: DD-MM-YYYY and MM-DD-YYYY
The used field separator - appears two times - must be unique (the
same separator used twice; no mixing field separator): DD-MM/YYYY is
illegal.
In your case, a single separator is not a bad thing.
And, allowing either of those date formats WILL cause grief because
they can be ambiguous.

For instance, is 08-09-2003 Aug 9, 2003 or Sep 8, 2003 ?
There is not way to tell unless you only allow one of those formats.


- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Emile Schwarz
2004-01-02 09:04:42 UTC
Permalink
Post by Norman Palardy
For instance, is 08-09-2003 Aug 9, 2003 or Sep 8, 2003 ?
There is not way to tell unless you only allow one of those formats.
No, it make sense when you watch the Date Format CheckBox: checked = Aug 9, not
checked = Sept 8.

The CheckBox shows the current date “syntax”: DD-MM-YYYY or MM-DD-YYYY. I was
even thinking at replacing it with a PopupMenu for more constructions like
YYYY-MM-DD; I must note that I didn't need this format (now or is it yet ?). In
this case also, the user can watch the date format and type the values accordingly.


BTW: do you know why American people * use the reverse date format (MM-DD-YYYY) ?

For the YYYY-MM-DD format, as a computer user (and developer), I understand it
as the "best sortable date format" as ever.

Cheers,

Emile


* The colonies as some UK people can tell ;)
Post by Norman Palardy
REALbasic-NUG Digest #9812 - Thursday, January 1, 2004
Subject: Re: RegEx question: how to get only digits (numbers only)
Date: Thu, 01 Jan 2004 12:19:54 -0700
Post by Emile Schwarz
Hi,
a. the history (for the record)
--------------
What is the current goal ? The user enters a date using two formats (a
CheckBox gives the syntax). The field separator can be either a space,
'-' or '/', the same in each date (no mixing allowed).
The current code check for day (< 32), month (< 13) and year (> 1900)
value range and issue an alert otherwise; a year value below 1900 is
accepted, other values are rejected.
The first version without any error checking was below 15 lines, but
I'm actually between 50 and 100 lines (Dim, Comments and empty lines
included).
---------
I checked EditField.Mask and it was good. *
BUT, in the point where I am in the project, I reject that solution.
Pro: avoid bad typing.
Cons: only one 'field' separator is available. (see field separator
below)
This IS an issue with the existing MASK for editfields.
They only allow a single mask, which makes them less useful than they
could be if several masks were allowable.
Post by Emile Schwarz
The previous 'solution' was to use InStr.
Pro: I can accept some separators: ' ' (space), '-', '/'...
Cons: I cannot mix two separators (see field separator below)
I can only prevent some errors: month and day field value range
I cannot prevent typing errors (adding a character inside the
string)
Field separator
The entry is a date string. Two formats are allowed: DD-MM-YYYY and
MM-DD-YYYY
The used field separator - appears two times - must be unique (the
same separator used twice; no mixing field separator): DD-MM/YYYY is
illegal.
In your case, a single separator is not a bad thing.
And, allowing either of those date formats WILL cause grief because
they can be ambiguous.
For instance, is 08-09-2003 Aug 9, 2003 or Sep 8, 2003 ?
There is not way to tell unless you only allow one of those formats.
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Jack Beckman
2004-01-02 16:47:41 UTC
Permalink
Post by Emile Schwarz
BTW: do you know why American people * use the reverse date format (MM-DD-YYYY) ?
We wonder why you folks across the pond use the reverse format <g>.


Jack
Quis custodiet ipsos custodes?







Jack
Quis custodiet ipsos custodes?




- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Craig A. Finseth
2004-01-02 16:51:40 UTC
Permalink
Post by Emile Schwarz
BTW: do you know why American people * use the reverse date format (MM-DD-YYYY) ?
We wonder why you folks across the pond use the reverse format <g>.

No, it's the US that's wrong. Everywhere else, the format goes from
smallest to largest (day, month, year) or (better) largest to smallest
(year, month, day). The only people who can't make up their minds are
here in the US...

The answer is "we're broken, that's why" (:-).

It's probably due to all those people living in the wide open spaces
who had a hard enough time keeping track of the months that they
couldn't remember the day or something.

Craig


- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Norman Palardy
2004-01-02 17:04:41 UTC
Permalink
Post by Emile Schwarz
Post by Norman Palardy
For instance, is 08-09-2003 Aug 9, 2003 or Sep 8, 2003 ?
There is not way to tell unless you only allow one of those formats.
No, it make sense when you watch the Date Format CheckBox: checked =
Aug 9, not checked = Sept 8.
The CheckBox shows the current date “syntax”: DD-MM-YYYY or
MM-DD-YYYY. I was even thinking at replacing it with a PopupMenu for
more constructions like YYYY-MM-DD; I must note that I didn't need
this format (now or is it yet ?). In this case also, the user can
watch the date format and type the values accordingly.
As long as there is a way to use one or the other unambiguously.
Then you can just change the mask to match one or the other when the
check box is checked and unchecked.
But, with RB's built in edit field masks you can only allow 1 separator
to be used since you can only have 1 active mask.
If you had a list of formats like :

YYYY-MM-DD
YYYY/MM/DD
YYYY.MM.DD

that listed each format with one separator then you could do what you
want.
Post by Emile Schwarz
BTW: do you know why American people * use the reverse date format (MM-DD-YYYY) ?
I have no idea why they use that. I'm not American and I dislike that
format as well.
Post by Emile Schwarz
For the YYYY-MM-DD format, as a computer user (and developer), I
understand it as the "best sortable date format" as ever.
Personally I like that if you have to use an all numeric format.

The only other one I've used in programs is YYYY MMM DD (2003 Jan 01)
which you can write in any order and not make it ambiguous
Post by Emile Schwarz
Cheers,
Emile
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Terry Ford
2004-01-02 19:43:41 UTC
Permalink
Post by Norman Palardy
I have no idea why they use that. I'm not American and I dislike
that format as well.
Ahh...the Americanization of the 'Cheeseheads/Canucks' (Canadians
like you and me). Logically the sequence should be from the largest
quantity to the smallest. Year/Month/Day/Hour/Minute/Second. Now here
it where it gets confusing. (Actually it started long before the
colonies were discovered)

Since "Time representation" has been traditionally *not* been metric,
we have both Ticks (1/60th of a second), Microseconds (1/1,000,000th)
functions in RB with milliseconds (1/1000) in timers.

I guess we'll simply have to blame those people who originated the
mathematics of 360 degree circles and 24 hour days. ;-)
--
Terry

- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Norman Palardy
2004-01-02 21:58:29 UTC
Permalink
Post by Norman Palardy
I have no idea why they use that. I'm not American and I dislike that
format as well.
Ahh...the Americanization of the 'Cheeseheads/Canucks' (Canadians like
you and me). Logically the sequence should be from the largest
quantity to the smallest. Year/Month/Day/Hour/Minute/Second. Now here
it where it gets confusing. (Actually it started long before the
colonies were discovered)
Acckk .... not cheeseheads .... they're in Minnesota.

I prefer the YYYYMMDDHHMMSS format for many things and have used that
since my mainframe days
It sorts nicely and sensibly ascending or descending.
I guess we'll simply have to blame those people who originated the
mathematics of 360 degree circles and 24 hour days. ;-)
The Babylonians ? Wow .... this is an OLD problem


- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Terry Ford
2004-01-02 22:24:01 UTC
Permalink
Post by Norman Palardy
Acckk .... not cheeseheads .... they're in Minnesota.
Did a Google on "Cheeseheads" and "Chedderheads". They are actually
from Wisconsin. (Green Bay Packers?) I must have been mistaken about
it referring to Canadians but I sort of remembered it mentioned about
ten years ago.
--
Terry

- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Norman Palardy
2004-01-02 23:24:26 UTC
Permalink
Post by Terry Ford
Post by Norman Palardy
Acckk .... not cheeseheads .... they're in Minnesota.
Did a Google on "Cheeseheads" and "Chedderheads". They are actually
from Wisconsin. (Green Bay Packers?) I must have been mistaken about
it referring to Canadians but I sort of remembered it mentioned about
ten years ago.
Right.
Wisconsin is CheddarHeads (Packers fans)

Never heard it applied to Canadians.

The word hoser is a whole different thing though ... :-)


- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
Emile Schwarz
2004-01-03 10:15:21 UTC
Permalink
Post by Terry Ford
I guess we'll simply have to blame those people who originated the
mathematics of 360 degree circles and 24 hour days.
No, this is a false assumption.

a. 24 hours a day comes from the Earth revolution on itself (astronomy I would say),
b. 360 degree circles have nothing to do with time (I think).

And at last, we were talking about the syntactic representation of the date.

Last but not least, I recall a TV documentary showing that south American
natives (before 1492) get a fine representation of the time (60 seconds for an
hour, 24 hours a day and 265,xx days a year) way before us, from the old
continent (both Europe and Middle-East)...

It's amazing to see that homo sapiens have re-invented the wheels all along its
evolution (by not sharing knowledge voluntary or not).

But here, I am OT from REALbasic... ;)

Have a nice sunday,

Emile
Post by Terry Ford
REALbasic-NUG Digest #9818 - Friday, January 2, 2004
Re: RegEx question: how to get only digits (numbers only)
Subject: Re: RegEx question: how to get only digits (numbers only)
Date: Fri, 02 Jan 2004 11:43:41 -0800
Post by Norman Palardy
I have no idea why they use that. I'm not American and I dislike
that format as well.
Ahh...the Americanization of the 'Cheeseheads/Canucks' (Canadians
like you and me). Logically the sequence should be from the largest
quantity to the smallest. Year/Month/Day/Hour/Minute/Second. Now here
it where it gets confusing. (Actually it started long before the
colonies were discovered)
Since "Time representation" has been traditionally *not* been metric,
we have both Ticks (1/60th of a second), Microseconds (1/1,000,000th)
functions in RB with milliseconds (1/1000) in timers.
I guess we'll simply have to blame those people who originated the
mathematics of 360 degree circles and 24 hour days.
- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Loading...