INDEX
Explanations
concepts related to moral philosophy and societal structures
New Auto-Interp
Negative Logits
frank
-0.15
([[
-0.15
outs
-0.14
eca
-0.14
eka
-0.14
egade
-0.14
ionate
-0.14
Saga
-0.14
vier
-0.13
ker
-0.13
POSITIVE LOGITS
rist
0.16
Carlson
0.15
cond
0.14
Minor
0.14
rahim
0.13
HAL
0.13
å§
0.13
_REGISTRY
0.13
entai
0.13
ãĤ¡
0.13
Activations Density 0.003%