INDEX
Explanations
phrases expressing strong opinions and evaluations about various topics
New Auto-Interp
Negative Logits
prot
-0.15
лан
-0.14
isons
-0.14
umen
-0.14
Hudson
-0.13
usalem
-0.13
ueva
-0.13
Caldwell
-0.13
akin
-0.13
prit
-0.13
POSITIVE LOGITS
imdi
0.16
anke
0.15
/vnd
0.14
tu
0.14
lund
0.14
akit
0.14
scribe
0.14
oriously
0.13
å§
0.13
.escape
0.13
Activations Density 0.729%