INDEX
Explanations
references to specific car models
references to specific car models, particularly from Tesla
New Auto-Interp
Negative Logits
gob
-0.72
Barton
-0.70
uca
-0.68
gore
-0.64
overfl
-0.64
Greenwald
-0.64
prejudice
-0.63
english
-0.63
Bulgar
-0.62
foreigners
-0.62
POSITIVE LOGITS
Model
3.89
Model
2.83
Models
2.56
model
2.34
model
2.29
models
2.01
models
1.84
modeling
1.74
modelling
1.60
modeled
1.51
Activations Density 0.014%