INDEX
Explanations
vulgar and offensive language
New Auto-Interp
Negative Logits
kinson
-0.15
Opport
-0.14
ULSE
-0.14
ecided
-0.14
arken
-0.13
yang
-0.13
maal
-0.13
yre
-0.13
озд
-0.13
stile
-0.13
POSITIVE LOGITS
Wheel
0.16
CommandEvent
0.15
untu
0.15
üzel
0.15
é϶
0.14
ordin
0.14
Permanent
0.14
BUM
0.14
[&
0.13
acen
0.13
Activations Density 0.094%