INDEX
Explanations
language related to perception and first impressions
associated with superficial appearances
surface appearance and first glance
New Auto-Interp
Negative Logits
Lieber
-0.52
↵
-0.51
préférences
-0.47
vál
-0.46
сіб
-0.46
chave
-0.45
详细信息
-0.45
(
-0.45
anera
-0.44
жере
-0.44
POSITIVE LOGITS
outwardly
1.12
seeming
1.11
apparente
1.04
看似
0.97
decep
0.95
itſelf
0.95
seemingly
0.94
appearances
0.93
outward
0.91
Appearances
0.91
Activations Density 0.177%