Difference between revisions of "Search regexp"

From AMule Project FAQ
Jump to: navigation, search
 
m (Removed Version Tag)
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
related::<md4hash>
+
== Description ==
 +
This article explains how to tweak [[search]]es and some handy tips and tricks when searching with [[aMule]].
  
So someone could try to search something like
+
Notice that this tricks may not work for you at some moments, since it depends on the [[server]]'s running software. However, most [[ED2k|ed2k]] servers run the latest version of [[lugdunum]], so this tricks will work for 99% of your servers.
  
related::<md4hash> AND Video AND SIZE > 1000000
+
== First notes ==
 +
Searches always search for all the words in the query and are case-insensitive.
  
The server checks if the file is known.
+
== Boolean search ==
 +
Since aMule 2.1.0 boolean search is supported. This allows fine tweaked searching queries.
  
If not -> End of request, 0 result.
+
The following operators are available:
  
If yes : It scans the list of clients that share this file.
+
{|+ Operators
 +
| '''''Operator''''' || '''Meaning'''
 +
|-
 +
| ''AND'' || Both the query before and after the operator must match
 +
|-
 +
| ''OR'' || Either the query before or after the operator must match
 +
|-
 +
| ''NOT'' || The query after the operator must not match
 +
|}
  
A temporary 'working set' is inited to empty.
+
Instead of ''AND'' you can use ''&'' and instead of ''NOT'' you can use simply ''!'' ('''not''' followed by a space).
  
For each client in the list, it scans the list of its shares, adding found file in a working set :
+
Example: ''xfree OR xorg''
  
If the file has a small #availability (like 1 : only one people share it), ignore it.
+
Example: ''drivers AND linux''
  
If the file (md4 hash) is already in the working set, adds 1 to the 'related count'
+
Example: ''some_free_music AND NOT some_artist'' is equivalent to ''some_free_music & !some_artist''
  
If not, check if the file meets the search criteria (if any was specified in the search req)
+
So, when you introduce a search query, it is read and each word is understood either as an operator (when it is ''AND'', ''&'', ''OR'', ''NOT'' ou ''!'') or as a query.
  
If the file meets the criteria, adds it to the working set with the 'related count' set to 1.
+
Example: ''aMule AND MacOS'' will search for files containing ''aMule'' and ''MacOS'' in their filename.
  
Some logic could be added to make sure the working set could not use too much ram (if a threshold is hit, just do a garbage collection to free half of the entries for example)
+
If no operator is found between two words, it defaults to the ''AND'' operator.
  
Then sort the working set byt the 'related count' key, and give the 300 files having the highers 'related count'. We then free the working set (no 'more' request could be asked to the server to get next 300 files, because keeping the working set in memory would be too expensive)
+
Notice that operators are case sensitive so, while ''AND'' is an operator, ''and'' is a word.
  
---
+
Words are any characters between between any of the following characters:
  
Searches for multiple file extensions, support not<space> or !<no_space> operator too in file extension (like "zip,rar,cbz,cbr" or "!wme,!wma" or "not wme")
+
{|+ Word separators
 +
| '''Character''' || '''''Description'''''
 +
|-
 +
| , || ''Comma''
 +
|-
 +
| ; || ''Semicolon''
 +
|-
 +
| . || ''Dot''
 +
|-
 +
| : || ''Colon''
 +
|-
 +
| - || ''Dash''
 +
|-
 +
| _ || ''Underscore''
 +
|-
 +
| ' || ''Apostrophe''
 +
|-
 +
| / || ''Slash''
 +
|-
 +
| ! || ''Bang''
 +
|-
 +
|  || ''Space''
 +
|}
  
---
+
Null words are ignored.
  
Ability to perform exact searches : Clients can enclose words in ' or " (next emule version needed for ") . Examples : 'blank & john' OR 'the the'
+
Example: ''aMule, MacOS'' is equivalent to ''aMule,MacOS'' which is equivalent to ''aMule AND MacOS''.
  
---
+
You can group words and operators in brackets, to make them into a single query with each its sub-queries.
  
words separators: , ; . : - _ ' / !
+
Example: ''aMule & (MacOS OR Win)'' will search for al files containing ''aMule'' and either ''MacOS'' or ''Win'' it their filename.
  
---
+
Brackets must always be matched.
  
and & or not
+
Example: ''aMule AND optionA)'' will fail since the closing bracket matches no opening bracket.
  
---
+
Example: ''aMule OR (linux'' will aslo fail since the opening bracket matches no closing bracket.
  
Support for searches by files hashes. It accepts several links (ed2k://|file|name|size|Hash (anything after the hash will be ignored)), (ed2k:size:Hash), (magnet:?xt=ed2k:Hash), ...
+
== Exact matches ==
 +
Exact matches are an extension of boolean searches.
  
ed2k::<md4hash> or ed2k:<size>:<md4hash>
+
You may want to search for a series of words together or some other string containing word separators in it. To do so, wrap that string with apostrophes.
  
---
+
Example: ''aMule and '2.1.0' '' searches for files containing ''aMule'' and ''2.1.0'' in their filename (notice that the dots are no longer being interpreted as word separators).
  
The letter ñ is an alias to n letter in searches. A search of 'espana' know matches 'españa'
+
Example: ''aMule'2.1.0' '' is exactly equivalent as ''aMule and '2.1.0' '', since the apostrophe is still a word separator, so ''aMule'' will be one word and ''2.1.0'' another (and the default operator is ''AND'').
 +
 
 +
Still, apostrophes will not wrap brackets.
 +
 
 +
Example: ''aMule and 'distro (debian)' '' will fail because the aposotrophes appear to be opened but not closed, since there is a bracket (actually, two of them) in between.
 +
 
 +
The only way to wrap brackets so that they can be searches as part of a word is wrapping them in double quotes. Double quotes can also wrap apostrophes.
 +
 
 +
Example: ''"aMule's 2nd birthday (19-08-2005)"
 +
 
 +
Example: ''aMule"2.1.0" '' is exactly equivalent as ''aMule and "2.1.0" '', since the double quotes are still word separators, so ''aMule'' will be one word and ''2.1.0'' another (and the default operator is ''AND'').
 +
 
 +
== Search for file except for extension ... ==
 +
You can use the ''not <query>'' and ''!<query>'' boolean operators in the "extension" field in the [[Usage_Searches|search window]].
 +
 
 +
This way you can search for files not containing the given extension, which is often very useful.
 +
 
 +
== Find files similar or related to some other file ==
 +
In the search box enter ''related::<hash>'' where ''<hash>'' is the is the [[hash]] value of some [[file]]. The results you will get will be files which are related or similar to that file.
 +
 
 +
Actually, what the server does is read an index with all files all [[client]]s are sharing and see, out of those sharing the file with has value ''<hash>'', which are the most popular files. Low [[availability]] files aren't listed.
 +
 
 +
== Search for hashes or exact file ==
 +
If you want to search for any file which's has value is ''<hash>''' (where ''<hash>'' is any MD4 hash value), you can search for ''edk2:<hash>'' and you will get the results.
 +
 
 +
As an extension, if you want to search for an exact file (maybe you want to see its availability or its [[rate]]) and searching it by its hash value gives several non-equal files, you can narrow the results by searching by the file's hash value ''<hash>'' and size ''<size>'': ''ed2k:<size>:<hash>''
 +
 
 +
Or even simply the file's [[ed2k link]] (anything after the file's hash in the link will be ignored): ''ed2k://|file|<name>|<size>|<hash>
 +
 
 +
== The special 'Ñ' character ==
 +
Current server and client software support [[unicode]] so it is no more an issue, but older versions would not support non-english characters, such as the ''ñ'' spanish character.
 +
 
 +
As a solution, the ''ñ'' character was aliased to ''n''. So, searching for ''españa'' or ''espana'' would give the same results.
 +
 
 +
This aliasing applies also to unicode-supporting clients and servers. The only thing you should notice is that in this case, since ''ñ'' is a different character than ''n'' and unicoded recognizes it, searching for words containing ''n'' will display results containing ''ñ'', but not the other way round.
 +
 
 +
== Notes ==
 +
You can combine the above tricks, so someone could try to search something like ''related::<some_hash> AND Video AND SIZE > 1000000''

Latest revision as of 10:22, 29 June 2008

Description

This article explains how to tweak searches and some handy tips and tricks when searching with aMule.

Notice that this tricks may not work for you at some moments, since it depends on the server's running software. However, most ed2k servers run the latest version of lugdunum, so this tricks will work for 99% of your servers.

First notes

Searches always search for all the words in the query and are case-insensitive.

Boolean search

Since aMule 2.1.0 boolean search is supported. This allows fine tweaked searching queries.

The following operators are available:

Operator Meaning
AND Both the query before and after the operator must match
OR Either the query before or after the operator must match
NOT The query after the operator must not match

Instead of AND you can use & and instead of NOT you can use simply ! (not followed by a space).

Example: xfree OR xorg

Example: drivers AND linux

Example: some_free_music AND NOT some_artist is equivalent to some_free_music & !some_artist

So, when you introduce a search query, it is read and each word is understood either as an operator (when it is AND, &, OR, NOT ou !) or as a query.

Example: aMule AND MacOS will search for files containing aMule and MacOS in their filename.

If no operator is found between two words, it defaults to the AND operator.

Notice that operators are case sensitive so, while AND is an operator, and is a word.

Words are any characters between between any of the following characters:

Character Description
, Comma
 ; Semicolon
. Dot
 : Colon
- Dash
_ Underscore
' Apostrophe
/ Slash
 ! Bang
Space

Null words are ignored.

Example: aMule, MacOS is equivalent to aMule,MacOS which is equivalent to aMule AND MacOS.

You can group words and operators in brackets, to make them into a single query with each its sub-queries.

Example: aMule & (MacOS OR Win) will search for al files containing aMule and either MacOS or Win it their filename.

Brackets must always be matched.

Example: aMule AND optionA) will fail since the closing bracket matches no opening bracket.

Example: aMule OR (linux will aslo fail since the opening bracket matches no closing bracket.

Exact matches

Exact matches are an extension of boolean searches.

You may want to search for a series of words together or some other string containing word separators in it. To do so, wrap that string with apostrophes.

Example: aMule and '2.1.0' searches for files containing aMule and 2.1.0 in their filename (notice that the dots are no longer being interpreted as word separators).

Example: aMule'2.1.0' is exactly equivalent as aMule and '2.1.0' , since the apostrophe is still a word separator, so aMule will be one word and 2.1.0 another (and the default operator is AND).

Still, apostrophes will not wrap brackets.

Example: aMule and 'distro (debian)' will fail because the aposotrophes appear to be opened but not closed, since there is a bracket (actually, two of them) in between.

The only way to wrap brackets so that they can be searches as part of a word is wrapping them in double quotes. Double quotes can also wrap apostrophes.

Example: "aMule's 2nd birthday (19-08-2005)"

Example: aMule"2.1.0" is exactly equivalent as aMule and "2.1.0" , since the double quotes are still word separators, so aMule will be one word and 2.1.0 another (and the default operator is AND).

Search for file except for extension ...

You can use the not <query> and !<query> boolean operators in the "extension" field in the search window.

This way you can search for files not containing the given extension, which is often very useful.

Find files similar or related to some other file

In the search box enter related::<hash> where <hash> is the is the hash value of some file. The results you will get will be files which are related or similar to that file.

Actually, what the server does is read an index with all files all clients are sharing and see, out of those sharing the file with has value <hash>, which are the most popular files. Low availability files aren't listed.

Search for hashes or exact file

If you want to search for any file which's has value is <hash>' (where <hash> is any MD4 hash value), you can search for edk2:<hash> and you will get the results.

As an extension, if you want to search for an exact file (maybe you want to see its availability or its rate) and searching it by its hash value gives several non-equal files, you can narrow the results by searching by the file's hash value <hash> and size <size>: ed2k:<size>:<hash>

Or even simply the file's ed2k link (anything after the file's hash in the link will be ignored): ed2k://|file|<name>|<size>|<hash>

The special 'Ñ' character

Current server and client software support unicode so it is no more an issue, but older versions would not support non-english characters, such as the ñ spanish character.

As a solution, the ñ character was aliased to n. So, searching for españa or espana would give the same results.

This aliasing applies also to unicode-supporting clients and servers. The only thing you should notice is that in this case, since ñ is a different character than n and unicoded recognizes it, searching for words containing n will display results containing ñ, but not the other way round.

Notes

You can combine the above tricks, so someone could try to search something like related::<some_hash> AND Video AND SIZE > 1000000