INDEX
Explanations
references to the automotive company 'Ford'
references to the Ford brand
New Auto-Interp
Negative Logits
Flavoring
-0.77
Ń·
-0.74
anwhile
-0.72
ablishment
-0.71
terday
-0.70
newsp
-0.66
laus
-0.66
HIP
-0.64
udic
-0.63
KD
-0.63
POSITIVE LOGITS
Ford
1.06
Ford
1.05
ragon
0.91
ham
0.85
rera
0.80
shire
0.80
lon
0.78
bies
0.77
Mustang
0.77
neau
0.75
Activations Density 0.007%