INDEX
Explanations
references to self-identity or self-concept
New Auto-Interp
Negative Logits
Jensen
-0.16
ContentLoaded
-0.15
ComVisible
-0.15
jadx
-0.14
etin
-0.14
spb
-0.14
-transitional
-0.14
çĽijåIJ¬é¡µéĿ¢
-0.14
antid
-0.13
ÑģÑĮ
-0.13
POSITIVE LOGITS
lessness
0.23
hood
0.23
preservation
0.23
Preservation
0.22
reliance
0.21
ish
0.20
ISH
0.20
uff
0.19
LESS
0.18
defense
0.18
Activations Density 0.013%