INDEX
Explanations
the word "back" with varying intensity across different contexts
New Auto-Interp
Negative Logits
Parenthood
-0.77
inational
-0.71
risome
-0.68
ISION
-0.67
cules
-0.67
kish
-0.63
isions
-0.61
cular
-0.61
entric
-0.61
女
-0.61
POSITIVE LOGITS
wards
1.21
lash
1.21
doors
1.03
door
1.03
packing
1.01
ward
1.00
GROUND
0.99
dated
0.95
haul
0.94
strap
0.94
Activations Density 0.034%