INDEX
Explanations
statements about personal relationships and controversial topics
New Auto-Interp
Negative Logits
amburger
-0.16
elden
-0.16
avia
-0.16
eries
-0.15
ả
-0.15
EGA
-0.15
fan
-0.14
eros
-0.14
blem
-0.14
aren
-0.14
POSITIVE LOGITS
å¥Ī
0.17
pac
0.16
Dmit
0.15
oni
0.14
iger
0.14
/vnd
0.14
borg
0.14
ELLOW
0.13
оло
0.13
ANJI
0.13
Activations Density 1.178%