INDEX
Explanations
the word "her" with high activation
the word "her" in various contexts
New Auto-Interp
Negative Logits
Reconstruction
-0.69
Governments
-0.61
cancell
-0.56
TTC
-0.55
stopp
-0.55
Enhance
-0.53
Origins
-0.53
tilt
-0.53
CVE
-0.53
Quarterly
-0.52
POSITIVE LOGITS
her
4.74
hers
3.26
HER
2.45
hes
2.08
hest
2.02
hed
1.72
hel
1.68
hen
1.63
heres
1.61
here
1.56
Activations Density 0.014%