INDEX
Explanations
negations and expressions of personal experience or belief
New Auto-Interp
Negative Logits
ss
-0.16
uide
-0.15
IONS
-0.14
SS
-0.14
ij
-0.14
mainly
-0.14
oton
-0.14
isset
-0.13
eza
-0.13
main
-0.13
POSITIVE LOGITS
å͝ä¸Ģ
0.34
един
0.31
unique
0.26
jedin
0.24
alone
0.24
einz
0.24
unique
0.22
único
0.22
earliest
0.22
única
0.22
Activations Density 0.206%