INDEX
Explanations
car features and performance
New Auto-Interp
Negative Logits
炝
-0.96
tispiece
-0.96
BIDDEN
-0.94
wnia
-0.90
мәләр
-0.88
ᚦ
-0.88
tellbar
-0.87
encantan
-0.84
рации
-0.82
Bikes
-0.82
POSITIVE LOGITS
comfort
1.70
interior
1.59
spacious
1.59
comfortable
1.52
features
1.46
space
1.41
safety
1.35
styling
1.34
looks
1.30
seating
1.28
Activations Density 0.032%