INDEX
Explanations
mentions of the body part "rib"
mentions of ribs or rib-related terms
New Auto-Interp
Negative Logits
Sunny
-0.77
terday
-0.75
SOS
-0.74
Claus
-0.70
GOODMAN
-0.69
Ceres
-0.67
eanor
-0.64
chool
-0.63
CHO
-0.63
Valhalla
-0.63
POSITIVE LOGITS
bons
1.51
bed
1.03
bing
1.01
bent
0.97
bon
0.94
bones
0.89
fed
0.89
bled
0.88
bard
0.88
bit
0.87
Activations Density 0.017%