INDEX
Explanations
instances of knowledge and awareness in various contexts
New Auto-Interp
Negative Logits
igo
-0.18
alars
-0.15
antro
-0.15
anca
-0.14
acades
-0.14
/or
-0.14
ieux
-0.14
wizard
-0.14
Ñħи
-0.14
imat
-0.13
POSITIVE LOGITS
-how
0.16
uckle
0.16
rf
0.15
upp
0.14
æĤī
0.14
ession
0.14
zia
0.14
arth
0.14
akk
0.13
ORTH
0.13
Activations Density 0.100%