INDEX
Explanations
words associated with identity and self-reflection
New Auto-Interp
Negative Logits
**
-0.13
avatar
-0.13
âĢĮ
-0.12
wiÄħ
-0.12
exit
-0.12
еÑĤелÑĮ
-0.12
-
-0.12
OrNull
-0.11
orama
-0.11
anmar
-0.11
POSITIVE LOGITS
/Foundation
0.15
/Framework
0.14
ongyang
0.13
!=-
0.13
026
0.13
ecko
0.13
BITTE
0.13
lessly
0.12
ebo
0.12
nts
0.12
Activations Density 1.923%