INDEX
Explanations
temporal references, particularly dates and years
New Auto-Interp
Negative Logits
.Dial
-0.16
atism
-0.15
itel
-0.14
ogui
-0.14
atre
-0.14
Rica
-0.14
hab
-0.14
åĮº
-0.14
Ð¡Ð¡Ðł
-0.13
les
-0.13
POSITIVE LOGITS
utherland
0.15
Throne
0.15
tow
0.15
ohl
0.15
atcher
0.15
inds
0.15
ë
0.14
ãĥ³ãĥĩ
0.14
ksi
0.14
eax
0.14
Activations Density 0.018%