INDEX
Explanations
references to personal identity and feelings of connection or isolation
New Auto-Interp
Negative Logits
ancer
-0.16
anta
-0.16
Impossible
-0.15
impossible
-0.14
.configure
-0.13
lên
-0.13
n
-0.13
arken
-0.13
öz
-0.13
Impossible
-0.13
POSITIVE LOGITS
inside
0.20
somewhere
0.20
within
0.18
داخÙĦ
0.17
near
0.17
ợ
0.17
Inside
0.17
inside
0.16
_inside
0.16
Inside
0.16
Activations Density 0.130%