INDEX
Explanations
mentions of "Che" or relevant terms associated with cheating behavior
New Auto-Interp
Negative Logits
loo
-0.19
ners
-0.18
alez
-0.16
ouro
-0.16
SION
-0.15
ra
-0.15
arkan
-0.15
bserv
-0.15
neo
-0.15
ãĥ¼ãĥį
-0.15
POSITIVE LOGITS
vron
0.23
vrolet
0.20
Che
0.20
-che
0.19
erokee
0.18
aper
0.17
che
0.17
ating
0.17
pch
0.17
apest
0.16
Activations Density 0.010%