INDEX
Explanations
terms related to self-description or self-evaluation
New Auto-Interp
Negative Logits
Jensen
-0.16
lj
-0.16
_acquire
-0.15
fixed
-0.15
पन
-0.14
abwe
-0.14
agner
-0.14
tie
-0.14
hung
-0.14
akest
-0.14
POSITIVE LOGITS
/self
0.38
Self
0.29
Self
0.28
(Self
0.27
self
0.26
self
0.26
SELF
0.25
-self
0.24
SELF
0.24
=self
0.22
Activations Density 0.030%