INDEX
Explanations
references to physical scars
references to scars, both physical and metaphorical
New Auto-Interp
Negative Logits
Unch
-0.64
eering
-0.63
eers
-0.61
Bulldogs
-0.59
ysis
-0.59
stabilization
-0.57
arantine
-0.57
deliveries
-0.57
desperate
-0.57
Standards
-0.57
POSITIVE LOGITS
red
1.27
crow
1.08
lett
1.07
ring
1.07
ves
1.06
fing
1.06
fed
1.02
face
1.02
lets
1.01
nton
1.01
Activations Density 0.032%