INDEX
Explanations
self-reflective questions and statements
New Auto-Interp
Negative Logits
heny
-0.73
emis
-0.72
Mub
-0.68
Syndicate
-0.65
microsoft
-0.64
effective
-0.64
rought
-0.63
ritz
-0.63
von
-0.62
iens
-0.62
POSITIVE LOGITS
selves
0.75
mate
0.73
ħĭ
0.70
åĤ
0.69
çīĪ
0.69
selves
0.69
subconscious
0.68
unconsciously
0.67
creatively
0.66
ens
0.66
Activations Density 7.732%