INDEX
Explanations
references to public transportation, specifically buses
New Auto-Interp
Negative Logits
eer
-0.20
SPA
-0.17
amura
-0.17
aires
-0.17
e
-0.16
ately
-0.16
eut
-0.15
оÑĢов
-0.15
ustin
-0.15
溶
-0.15
POSITIVE LOGITS
queda
0.27
inness
0.24
loads
0.24
iens
0.23
ier
0.23
INESS
0.23
ines
0.22
iest
0.22
(es
0.22
pir
0.21
Activations Density 0.014%