Discussions » Greasy Fork Feedback

Make the Similar Scripts Feature work properly

§
Posted:
Edited:
The actual Similar Scripts Feature doesn't show anything relevant at all. All the scripts are for random websites not related at all with my script codes, neither does the similar scripts run on the same website.

All Similar Scripts to my scripts that I see, the code doesn't even have 1 single line equal to my script or really similar that does almost the same thing.

Please add an option to filter the Similar Scripts by website, so I can check for Similar Scripts that run on the same website that my script does.
§
Posted:
What "similar scripts" feature are you talking about?
§
Posted:
This features works by compressing the two scripts together and seeing how compressible they are together. It's not looking for exact line matches. Though I don't know why it thinks your script is similar to anything else, it does seem work at least part of the time.
§
Posted:
The page says that on this url scripts that might be based on my script will be shown, the problem is that they don't have anything in common, this is true for any of my scripts, that's why this is making me think that this feature is not working as it should.

Why would someone want to compress two scripts together and see how compressible they are together?
§
Posted:
It's a measurement of how similar they are. It clearly doesn't working right in this case, but generally it does. I use it all the time.

I'd be open to suggestions on other methods of determining script similarity that would be performant enough to run on thousands of scripts, each being anywhere from a few KB to a few MB.
§
Posted:
Edited:
I think that making an text comparative tool would work the way I'm saying.

Like https://text-compare.com/, if we add one script to one side and another script to the other side we can get an "percentage" of how much similar or different they are, there are lots of websites that does this. Greasyfork included. Since this is how the history code comparation tool works.

By "compressing two scripts together", you mean that greasyfork add all the codes of one script into another script and see if they will work or do something? That's dumb and that's why the derivates feature doesn't work if this is the case. I think that the only way to compare how similar one script is to another would be verifying the number of equal words, symbols and equal code lines between both scripts. Probably comparing the number of code lines and total characters would help too.

Would be nice implementing this and giving the user an option to opt-out or opt-in, like the user could use a button to show include derivates of his script that has the same amount of total code lines, or he could disable that to find similar derivates of his script that doesn't have the same amount of code line, and so on.
§
Posted:

Like https://text-compare.com

I think that the only way to compare how similar one script is to another would be verifying the number of equal words, symbols and equal code lines between both scripts.

One script vs one other script is easy. The problem is that on every script update, it has to do the comparison against tens of thousands of scripts, each of which can be up to several megabytes. There needs to be a method that can accomplish this in a reasonable amount of time. The compression method takes 1-5 minutes to do this.

By "compressing two scripts together", you mean that greasyfork add all the codes of one script into another script and see if they will work or do something?

No, of course not. It's comparing the script size when compressed to the (script + other script) size when compressed. If the size goes up a lot, then they are very different. If the size goes up only a little, then they are very similar.

§
Posted:
Edited:
I see. Now I understand why this feature doesn't work at all.

There's no point in showing to users that there are other scripts on greasy fork that has the same file size as their script, I can't really see any point of it.

I'm not sure why you are worrying so much about the amount of time that comparing one script vs one other script would take. I prefer an feature that works but takes longer, than having a pointless feature that doesn't work. Script file size comparation has nothing do to it the script contents. So like that slither.io trainer I did, is an fork of another script (and my fork is "99%" identical to the original), but that original script doesn't even appear on my derivatives scripts list, that shows that something should change on this derivations feature.

I understand what you mean by saying that scripts are frequently updated and that comparation kind would take too much time, but does every user updates their scripts at least once a day? Do most greasyfork users update their script more than once a day? Probably not. Is better having a feature that works than having a feature that can't really be used.
If I'm asking about this feature and I'm suggesting improvements it means that there are other users that doesn't understand how that feature works either, and they may also want improvements on that feature.

If the feature can't/won't be improved/changed, then greasy fork should at least change that feature name and don't say that similar scripts are in the list, but rather simply say that the list, is a list that show scripts with almost the same file size.

Most users probably don't always check for derivatives of their scripts, so even if they updated their script once a day, they would not go straight to check if that greasy fork feature found any other script derivatives.That's why even if the comparation was done some days ago, I would still prefer an feature that works.

What you could also do is implement an feature to show when greasy fork compared the user script with other scripts. So the user could know when was the last comparation check.
Or you could add that "one script vs one other script" comparation feature, but instead of having that run "24/7" and run every time that the user updated his script, you could simply make and add a button on the derivates page, then when the user clicked on that, greasy fork would start comparing with all other scripts on greasy fork.
You could also show to the user an average time that the comparation update would take to finish, then he could check the results after that time/day.

Besides the long processing time, is there anything preventing you from adding that feature on greasy fork?
§
Posted:
Edited:

There's no point in showing to users that there are other scripts on greasy fork that has the same file size as their script, I can't really see any point of it.

That's not what it's doing, but I've tried tried explaining this to you 3 times and I don't feel like trying a 4th.

If you have a concrete suggestion on an library/algorithm that can accomplish checking for similarity I can try it and see how it performs.

wOxxOmMod
§
Posted:

The point of the algorithm is to check the reduction of the size, not the size itself. Thing is, compression algorithms work by finding similar patterns in the source so once it processed "script A" it has a dictionary of patterns, then it processes "script B" and uses that dictionary. So, in case "script B" is similar to "A" the dictionary of patterns will be reused a lot, thus the compressed output of "B" will be very very small as it'll be mostly a rehashing of "A". GreasyFork is looking at that last delta: the smaller it is, the more similar the two scripts are.

§
Posted:
Edited:
Oh, thanks. I kind of understand know, it's more about the file hash than the file size itself.

Well, that isn't working well either as I said.

I don't have any library/algorithm that can check for similarities. JasonBarnabe said that it was easy, so I thought that he was an really good programmer and he could do the library/algorithm himself.
§
Posted:
So I looked into this particular case a bit more, and I figured out what the problem was and fixed it. Technical details ahead.

As part of the calculation, the code is checking to see what the compressed size would be if the two scripts were identical. To do this, it simply concatenates the code to itself and compresses. However, due to the way the deflate algorithm works, a repeated string longer than 32KB can't be fully compressed. The Greasy Fork code accounted for this, however it was:

a) Taking the character count of the script rather than the size in bytes, which are different if there are multi-byte characters (like non-Latin ones).
b) Assuming the limit was exactly 32KB, when in fact it was a little less.

These bugs made the calculation incorrect when comparing certain scripts.

After deploying the fix and rechecking the top ones on https://greasyfork.org/en/scripts/408656-slither-io-trainer-hack/derivatives, they no longer show as very similar. There are probably other cases out there that need to be rechecked, but I'll let the auto recheck process handle that.
§
Posted:
Edited:
Thank you. I'm glad that I could help at least a little bit by making this thread.
Sorry for not having an magical solution ready to implement on greasyfork...

At least now for that script, one slither.io bot script is also appearing 0.613 [Diff]: Slither.io Bot-hack, but some of them are still chinese scripts that I can't even read haha, and most are random scripts lol... But thanks for the fix.

Even though I don't have an magical library to implement here or whatever, could you consider adding the features I said?
Like, explaining a bit how this feature works on every single derivates page, and adding some filters for the user?
Like make the user able to find derivates that work only on the same website, or were made after or before your script publish date, among other things...
§
Posted:
Scores of 0.5 and 0.6 are actually pretty low on this scale. At that point they're similar as far as they're both JavaScript and about the same length. I probably should look into rescaling of the score.

I don't think it's necessary to explain how it works. The filters I think would be counter-productive. Right now it's showing top 100 regardless of how similar - I should put in a cut-off at maybe 0.7 or so.
§
Posted:
Thanks for the reply.
Okay then...

I'm just not sure if would be good or not hiding scripts that have less than 0.7 similarities. In the case of the script I linked here the most similar script is 0,613 [Diff] on the scale.

You as the admin could analyze some good amount of links like https://greasyfork.org/en/scripts/408656-slither-io-trainer-hack/derivatives, and check if this would be an good idea or not, based in how many scripts below 0.7 are completely random and how many are a bit similar and then make your choice. At least if I was the admin I would probably do that.

Good luck
§
Posted:
Now only showing ones with >= 0.75 similarity.

Post reply

Sign in to post a reply.