INDEX
Explanations
instances of the word "match" or variations thereof
New Auto-Interp
Negative Logits
quirrel
-0.81
zzleHttp
-0.76
бенок
-0.74
thâu
-0.72
skyl
-0.70
kasarigan
-0.70
::-
-0.70
hehe
-0.69
Vader
-0.69
-0.69
POSITIVE LOGITS
match
2.63
Match
2.58
MATCH
2.58
Match
2.50
match
2.50
matches
2.44
MATCH
2.32
Matches
2.29
matches
2.06
Matches
2.02
Activations Density 0.050%