INDEX
Explanations
expressions related to introspection and self-awareness
New Auto-Interp
Negative Logits
.authenticate
-0.15
oya
-0.15
itre
-0.14
oku
-0.14
itr
-0.13
shint
-0.13
à¥įयव
-0.13
iedades
-0.13
osterone
-0.13
Introduced
-0.13
POSITIVE LOGITS
imagination
0.48
imag
0.46
imagin
0.39
fantas
0.38
fantasy
0.38
fantasies
0.35
dream
0.35
imag
0.35
imagining
0.35
imagin
0.35
Activations Density 0.246%