INDEX
Explanations
themes of conflict between personal beliefs and actions
New Auto-Interp
Negative Logits
ishi
-0.16
pz
-0.16
管
-0.15
pbs
-0.14
Gir
-0.14
æ±
-0.14
eselect
-0.14
иÑĤа
-0.14
esa
-0.14
IFF
-0.13
POSITIVE LOGITS
Erd
0.17
actions
0.15
directions
0.15
å¥Ī
0.14
æ¯Ľ
0.14
ACTIONS
0.14
aign
0.14
Actions
0.14
cci
0.14
ãģĻãģĻ
0.14
Activations Density 0.116%