INDEX
Explanations
phrases related to self-identity and perception
New Auto-Interp
Negative Logits
бÑĥдÑĮ
-0.14
Ñģион
-0.13
ÎŃÏģγ
-0.13
understandably
-0.12
alles
-0.12
å°ļ
-0.12
CKER
-0.12
alım
-0.12
derabad
-0.11
.ta
-0.11
POSITIVE LOGITS
actually
1.01
really
0.89
actually
0.84
actual
0.82
realmente
0.79
Actually
0.77
really
0.75
Really
0.75
truly
0.73
wirklich
0.73
Activations Density 1.281%