INDEX
Explanations
references to public transportation, specifically buses and related transit terms
New Auto-Interp
Negative Logits
eer
-0.21
SPA
-0.17
aires
-0.17
amura
-0.17
üst
-0.16
ORY
-0.15
ustin
-0.15
e
-0.15
eut
-0.15
ately
-0.15
POSITIVE LOGITS
queda
0.27
loads
0.25
(es
0.23
ines
0.23
kers
0.22
iens
0.22
inness
0.22
load
0.21
ier
0.21
INES
0.21
Activations Density 0.015%