INDEX
Explanations
phrases related to driving or vehicles
New Auto-Interp
Negative Logits
isd
-0.18
ilde
-0.17
ounds
-0.16
xin
-0.15
mach
-0.15
ares
-0.15
ary
-0.14
ipc
-0.14
iser
-0.14
sg
-0.14
POSITIVE LOGITS
haft
0.29
.drive
0.20
urge
0.19
away
0.19
shaft
0.17
_license
0.16
/dr
0.16
-driving
0.16
ered
0.15
-drive
0.15
Activations Density 0.032%