INDEX
Explanations
references to past events or situations
New Auto-Interp
Negative Logits
endon
-0.18
lei
-0.17
annes
-0.17
åύ
-0.17
fter
-0.15
iams
-0.14
viÄį
-0.14
ered
-0.14
thing
-0.14
phan
-0.14
POSITIVE LOGITS
ebin
0.37
imes
0.34
ime
0.32
iche
0.31
tense
0.29
ries
0.28
/current
0.28
eur
0.27
ures
0.27
oral
0.25
Activations Density 0.032%