INDEX
Explanations
expressions related to social and political aspirations and dilemmas
New Auto-Interp
Negative Logits
ÙĪØ±Øª
-0.16
alia
-0.16
anc
-0.15
_ipc
-0.14
conc
-0.14
oko
-0.14
sg
-0.13
057
-0.13
figcaption
-0.13
ux
-0.13
POSITIVE LOGITS
iaz
0.15
ling
0.14
DD
0.14
/DD
0.14
atie
0.13
lings
0.13
ÐIJÑĢÑħÑĸв
0.13
íĴĪ
0.13
Reality
0.13
iece
0.13
Activations Density 0.074%