INDEX
Explanations
phrases related to awareness and self-awareness
New Auto-Interp
Negative Logits
eko
-0.16
roller
-0.15
ammable
-0.15
ahoma
-0.14
embros
-0.14
uebas
-0.14
sta
-0.14
otype
-0.14
antro
-0.14
imenti
-0.14
POSITIVE LOGITS
fulness
0.20
ness
0.18
/alert
0.16
ırak
0.15
-aware
0.15
732
0.15
ä¹İ
0.14
akit
0.14
delt
0.14
684
0.14
Activations Density 0.035%