INDEX
Explanations
concepts related to critical thinking and self-awareness
New Auto-Interp
Negative Logits
onia
-0.17
aser
-0.15
ãĥ¼ãĤº
-0.15
usz
-0.15
eah
-0.14
acher
-0.14
avanaugh
-0.14
thoughtful
-0.14
æĢĿæĥ³
-0.14
Gale
-0.14
POSITIVE LOGITS
action
0.34
-action
0.28
actions
0.28
action
0.27
ACTION
0.27
'action
0.26
Action
0.23
actions
0.23
Action
0.23
дейÑģÑĤвиÑı
0.23
Activations Density 0.131%