INDEX
Explanations
mentions of the word "her" in various contexts
the repeated use of the word "her."
New Auto-Interp
Negative Logits
CCC
-0.65
ouls
-0.64
sew
-0.62
rolling
-0.58
transformer
-0.58
RAW
-0.57
trim
-0.56
caps
-0.56
Bulk
-0.56
govtrack
-0.56
POSITIVE LOGITS
itage
1.55
ding
1.10
itance
1.02
ald
0.98
der
0.97
tz
0.97
rha
0.96
jee
0.93
pes
0.90
rer
0.90
Activations Density 0.021%