INDEX
Explanations
references to vulgarity and profanity
New Auto-Interp
Negative Logits
opic
-0.15
vt
-0.15
Ñħи
-0.15
dale
-0.14
agit
-0.14
/plugin
-0.14
yang
-0.14
maal
-0.14
illed
-0.14
czy
-0.14
POSITIVE LOGITS
oten
0.14
spb
0.13
æķ¬
0.13
ê¶Į
0.13
edom
0.13
ascar
0.13
Rud
0.13
edd
0.13
íķ©
0.13
Reef
0.13
Activations Density 0.017%