INDEX
Explanations
elements related to self-reflection and personal agency
New Auto-Interp
Negative Logits
ÑĨионнÑĭй
-0.15
obao
-0.15
iyah
-0.14
dle
-0.14
omite
-0.14
Insensitive
-0.13
Ñĩил
-0.13
ÑĥÑĤоÑĩ
-0.13
elib
-0.13
.openConnection
-0.13
POSITIVE LOGITS
self
0.73
self
0.64
Self
0.64
-self
0.61
SELF
0.60
Self
0.58
SELF
0.55
_self
0.54
(self
0.52
self
0.51
Activations Density 0.296%