INDEX
Explanations
expressions of belief or affirmation
New Auto-Interp
Negative Logits
arial
-0.15
Ïģκ
-0.14
ocrat
-0.14
enie
-0.14
ÏĨι
-0.14
ummings
-0.14
mia
-0.14
acea
-0.14
rips
-0.13
suit
-0.13
POSITIVE LOGITS
ably
0.22
able
0.19
ye
0.18
me
0.17
abel
0.16
ability
0.16
yourselves
0.15
yourself
0.15
ibold
0.15
-you
0.15
Activations Density 0.037%