INDEX
Explanations
phrases conveying existence or location of items or concepts
New Auto-Interp
Negative Logits
linger
-0.19
rs
-0.18
iro
-0.15
net
-0.14
æł·
-0.14
daylight
-0.14
amo
-0.14
AA
-0.14
adÄĽ
-0.14
pert
-0.14
POSITIVE LOGITS
ensing
0.16
ÙĴØŃ
0.16
eken
0.16
kea
0.15
agna
0.15
ikt
0.15
Falsy
0.14
.camel
0.14
ocha
0.14
Briggs
0.14
Activations Density 0.031%