INDEX
Explanations
phrases that indicate a sense of acceptance or rationalization of circumstances
New Auto-Interp
Negative Logits
.comp
-0.15
wap
-0.15
igger
-0.15
IENT
-0.15
azu
-0.14
Dipl
-0.14
leneck
-0.14
ultimately
-0.14
hood
-0.14
adam
-0.14
POSITIVE LOGITS
emie
0.17
cky
0.16
urette
0.16
coli
0.15
ĵ¨
0.15
bro
0.15
Serg
0.15
ÙĤÙĤ
0.15
Jarvis
0.14
ovaly
0.14
Activations Density 0.001%