INDEX
Explanations
references to cars or automobiles
New Auto-Interp
Negative Logits
olato
-0.51
Blitz
-0.50
}),
-0.50
yto
-0.49
Gottfried
-0.49
pena
-0.45
enei
-0.45
onde
-0.45
Nielsen
-0.44
ede
-0.44
POSITIVE LOGITS
car
1.48
cars
1.26
Car
1.24
Car
1.22
car
1.17
Cars
1.15
Cars
1.10
cars
1.09
voiture
0.96
CARS
0.95
Activations Density 0.015%