Jump to content
Battlefront is now Slitherine ×

Constructive followup on forum seach engine problems with "T-72" and similar terms


Redwolf

Recommended Posts

Use google and the "site:" feature.

Example:

http://www.google.com/search?hl=en&ie=UTF-8&q=t-72%20site%3Ahttp%3A//www.battlefront.com/community/&btnG=Google+Search

Modify search term accordingly and leave the "site:htt..." alone.

Moon, I think there's a vBulletin extension to add a google search box as a second search box next to the normal vBulletin internal search, I've seen that on other forums.

Link to comment
Share on other sites

Nice one, Redwolf. Although that example produces mainly Balkans On Fire hits... and Google has some other limitations as well.

I'm still wondering if there's something that could be done to improve the board's own search. I looked at vBulletin manual and there found a link to this MySQL manual page.

Now, there is some interesting looking bits about allowing dash as a searchable character.

How I added '-' to the list of word characters:

The documentation is weak in two regards: (1) it doesn't explain how to modify the map and (2) it doesn't touch on the implications of doing so. I'll try to solve (1), but cannot begin to speak to (2)

The charsets files exist at the location specified by the "character_sets_dir" system variable (use SHOW VARIABLES to see this) and is typically compiled in as "/usr/share/mysql/charsets". The name of the file is given by the "character_set_...' variables. Typically the default is "latin1". Thus the file I needed to change was /usr/share/mysql/charsets/latin1.xml

The <ctype><map> is the one we are after (other maps are "upper", "lower", "unicode" and the various collation maps).

The "ctype" map differs from the others in that is has a leading 0x00 before the character map, the meaning of which is unclear to me. Each entry of the map appears to classify the corresponding character according to the following bitmask:

0x01 Upper-case word character

0x02 Lower-case word character

0x04 Decimal digit

0x08 Printer control (Space/TAB/VT/FF/CR)

0x10 Not-white, not a word

0x20 Control-char (0x00 - 0x1F)

0x40 Space

0x80 Hex digit (0-9, a-f, A-F)

In my case, I needed the dash '-', but nothing else, so I altered the corresponding character position (0x2D - third row, third from the right) from 0x10 (Not-white, not a word) to 0x01 (Upper-case word).

There is little on the web to address this, but some commentary in the forums suggested that this was NOT the way to do this, but rather to write ones own full-text engine as the changing of the <ctype> map has implications for the SQL parser. This may be true, but I suspect SQL parsing would require a stricter classification of characters. The SQL statement "SELECT a-b FROM test" worked for me after this change.

Altering latin1.xml and restarting the server had the desired result.

Finally, there does not appear to be a way to create a new character set or collation without recompiling. If this is true, it might be desirable for the standard distribution to include a "custom" character set for just this sort of thing.

However, what that means, is it of any relevance here, or does it work, are questions that I'm way too half-brained to answer. So I'd like to hear a word from you interweb übergeeks.

I'd also like some ice cream.

Link to comment
Share on other sites

I think you can also create a firefox extension but I can't see how to do it yet.

IIRC it's just an XML structure.

No, it's easier, it's just a special hotlink.

It's called a "keyword link"

Make a hotlink to this:

http://www.google.com/search?hl=en&q=site%3Ahttp%3A%2F%2Fwww.battlefront.com%2Fcommunity%2F+%25s&btnG=Search

Leave the "%s" in there, you'll need that later.

Then, go to manage bookmarks and look at the preferences of that link. The keyword field is empty. Put something like "bfc" there. Close preferences window.

From now on you can go the the URL bar at the top and just enter

"bfc T-72"

(instead of "htttp://www.foo.com/blah" or whatever)

and it will execute this search limited to the forum with whatever keyword you enter after "bfc".

I can post a screenshot if needed.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...