Archery App Dev Guy

Developing an Archer Search Engine - Part 1

Background

Thirty-two thousand, two hundred eighty-one. That is the number of archers who have participated in a tournament on betweenends.com. For those who don't know, Between Ends is a website used in tournaments to keep track of score digitally. A tablet is used to manually input scores and people who are participating can check their scores on their phones via the website. All of this data is seemingly accessible without an account or any sort of anti-bot software(eg: Cloudflare, Recaptcha). After realizing this, I decided it would be cool if you could check your opponents skill level in a tournament. And hey, I'm a software developer, so I thought might aswell. With the amount of data Between Ends has you could do some crazy cool stuff! Average archer age? Most popular clubs? Total arrows shot across the site? It's all doable!

Statistics

Although the main point of the search engine centered around getting an individual archer's stats based on their name, I would love to have a huge index of global archery stats, so I decided I should make a list of all the statistics I needed. It came down too a large list of which I wanted in the global scope, tournament scope, event scope, and archer scope. Although I won't upload the list yet, just know it's big, and still incomplete.

Acquiring the Data

I've done plenty of web scraping before(the process of automatically extracting data from a website), so I knew where to start. In any chrome based browser pressing Ctrl+Shift+I on a site will open the developer tools. From there pressing network will show a list of all requests. (Requires a page reload) By looking for specific requests(usually type of "xhr") you can find where the data was coming from. After doing that on the homepage, the event list page, and the score viewer page I got all the urls I could.

My first thought upon getting this was "dang they have ZERO security". Not even a basic captcha. Something to note, upon arriving onto the Between Ends front page, they load all ~4000 (I think) tournaments in existence, instead of just the top 100 or something. This feels a little excessive, and bad for bandwidth, but it works just fine so I don't have complaints. Once I had the urls, I wrote a few python scripts to scrape all the data into folders, and boom, I had the whole site of Between Ends stored on my computer within the hour. The first thing I tried doing was manually reading the data. And my gost this data is ugly. Who the heck wrote this format? Starting off everything is an acronym. Just in the tournaments list there is, "limg", "rimg", "bimg", "msg", and "msg_link". Although some of these are easily understandable("msg" is "message"), many took some effort (eg: comparing data with the website) to finally understand. I discovered "limg" was left image, "rimg" was right image, and "bimg" was bottom image, which are aptly displayed on each tournament's page.

After figuring out everything in the tournaments, I moved onto the event pages. These weren't too confusing except for the "event_type" value, which I saw move between many values such as "RankingEvent", "CombinedRankingEvent", "MatchEvent". Those were the only ones I identified through manually looking through data, in a future post I'll take a deeper look via a script. The ranking event and combined ranking event didn't seem to be any different, but the match event category came with it's own custom formatting and page:

I tried pasting a normal event into a match event, but it didn't load, so apparently there is some different data loading between the two. I kept that as a note in my head, for when I would be using this data in the future.

That was it for the event list, so I moved on to the full event data values. This was the first file that gave me real pain. It was chock-full of acronyms. And even worse, the acronyms weren't even consistent with the other data. For example, this file also had event type and event name, except instead they were named "enm" and "etp". Like seriously consistency isn't even hard. Have some tact people! Display order and "dor" also underwent the same treatment. However, this value seemed to never really be different than the order in the data so I kinda just ignored it. Other acronyms in the event info (keep in mind most of these are educated guesses):

Honestly there is a solid 10+ more, and that's JUST from manually looking, not even from running over it with code. Even worse, sometimes some values just aren't present, for no apparent reason.

That was just the event info, in the scores it gets worse. Sometime the site just decided to return nothing, even though the associated event exists. Even when it does return score numbers there are two formats, and once again more meaningless acronyms. The worst in the score data though was completely meaningless score values. The score data is effectively a giant string of characters. Eg: "899T99998TT9TT". Each character is a value. The numbers are, well, the associated number value. "T"'s are tens, "M"'s are misses, "X"'s are x's, etc. To be honest past T, M, and X, I wouldn't expect any other values, but whoops Between Ends always has a big screw you. All unknown values:

And, drumroll please, an exclamation mark! WHO CAME UP WITH THIS? What could any of these mean I honestly do not know.

That was it, but after seeing all of that data, I was struggling to see how the site was till up and running. I concluded the only way to fully understand all the data would be to use the power of educated guesses, archery knowledge, and the site's code, which was mostly readable. Through similar web scraping shenanigans you could view how the site was working with their own crappy data. Safe to say, they were struggling too. The code looks like it was written by a CS major, who picked their major just for the money. It's bad... The formatting was horrible, the code was barely readable, just thinking about it hurts me. I won't upload a snippet, but if you take a look at it yourself you can just see how bad it was. To just interpret it would require a ChatGPT+ plan and 70oz of powerade. And because of that, I still haven't gotten to understanding the site just yet. My task list just keeps getting bigger!

Results

For the search engine, I concluded the best design would be to make a pregenerated list of names to tournaments, and when you search for a specific person the rest of the data will automatically fetch from Between Ends and generate their specific statistics. The global scoped data, however, has to be precalculated, because it is just too big to calculate each time someone loads the page. Since tournament scope and event scope data are on the bottom of my priorities, I still haven't decided whether to precalculate that data or generate it in real time.

The first analysis script I wrote was for the search engine, not the global data. It generated, based on the data, an index of

The first two were to be used for the search engine, but the last one was to assist me in searching through the aforementioned unknown event data / score values.

Although this project is still VERY incomplete(as apparent by all the data I still don't understand), I was able to get some approximate statistics from what I do currently understand, so I will share them here (keep in mind these are most DEFINITELY off by a good bit):

Conclusion

I'll definitely keep working on improving my understanding of the Between Ends data, and the search engine. I would like everyone to know that the code (once completed or in a publishable state) is gonna be free and open source. For the non-technically literate, that means that no one person owns it, everyone can freely edit and copy it. I don't believe in releasing a tool to give an unfair advantage to just a few people in tournaments, for money. That's loser behavior. In the future integrating the tool into a chrome extension to automatically check stats on the main Between Ends site, or a mobile app, would be a great idea. Sadly, that's still a long ways off. Anyone who wants to research their own is open to do so! If you discover anything about the unknown values please send me a message at this email: archeryguy@tuta.com. I look forward to this project's completion!