INDEX
Explanations
terms related to self-identification and self-reliance
New Auto-Interp
Negative Logits
herits
-0.15
teri
-0.15
ele
-0.15
pearance
-0.14
_UNUSED
-0.14
robat
-0.14
Dud
-0.14
awe
-0.14
yne
-0.14
\Type
-0.14
POSITIVE LOGITS
taught
0.19
proclaimed
0.19
iew
0.18
pedia
0.17
-described
0.17
ç«ĭãģ¦
0.17
emade
0.16
described
0.16
tapes
0.16
ta
0.16
Activations Density 0.012%