INDEX
Explanations
mentions of automotive safety features and their effectiveness
New Auto-Interp
Negative Logits
leneck
-0.15
.gdx
-0.15
.fm
-0.15
(___
-0.14
ì£
-0.14
anje
-0.14
lene
-0.14
ednou
-0.14
목
-0.14
ÙģØ§Ø±
-0.14
POSITIVE LOGITS
safety
0.20
features
0.20
feature
0.18
Safety
0.17
automatic
0.17
åĥį
0.16
signal
0.16
advanced
0.16
hon
0.15
rear
0.15
Activations Density 0.018%