INDEX
Explanations
sentences emphasizing communal responsibility and social bonds
New Auto-Interp
Negative Logits
ÙħÙĨÙĩ
-0.15
ooo
-0.14
haven
-0.13
oton
-0.13
abh
-0.13
ait
-0.13
StateManager
-0.13
_Private
-0.13
obao
-0.13
داد
-0.12
POSITIVE LOGITS
whether
0.23
whether
0.18
Whether
0.17
yes
0.17
""
0.16
yes
0.16
Whether
0.16
or
0.15
brick
0.15
even
0.15
Activations Density 0.402%