INDEX
Explanations
concepts related to grounding and self-awareness
New Auto-Interp
Negative Logits
otland
-0.17
orda
-0.15
wich
-0.15
.gnu
-0.14
latin
-0.14
Trace
-0.14
vertime
-0.14
erken
-0.14
strict
-0.14
Strict
-0.14
POSITIVE LOGITS
Sad
0.23
Sad
0.19
sad
0.17
Ling
0.17
CSR
0.16
inclus
0.15
clusive
0.15
vive
0.15
Inner
0.14
ocre
0.14
Activations Density 0.001%