INDEX
Explanations
mentions of the body part "fingers"
references to fingers and thumbs
New Auto-Interp
Negative Logits
[|
-0.78
judicial
-0.70
house
-0.69
Det
-0.68
nz
-0.67
public
-0.66
houses
-0.66
Constantin
-0.65
nce
-0.65
spect
-0.64
POSITIVE LOGITS
fingers
1.19
ingers
1.14
pring
1.12
mith
0.99
aws
0.98
creen
0.97
leeve
0.91
thumb
0.89
hops
0.88
paws
0.84
Activations Density 0.010%