INDEX
Explanations
references to vehicles, particularly cars
New Auto-Interp
Negative Logits
leground
-0.15
buz
-0.15
UGE
-0.15
adele
-0.15
urum
-0.15
ghi
-0.14
odb
-0.14
бом
-0.14
Jug
-0.14
mand
-0.14
POSITIVE LOGITS
alse
0.18
ocker
0.16
agan
0.16
immel
0.15
afe
0.14
afc
0.14
lio
0.14
okol
0.14
vise
0.14
ÃŃsto
0.14
Activations Density 0.029%