INDEX
Explanations
mentions of the word "Sen" followed by a number
New Auto-Interp
Negative Logits
Dawn
-0.73
Reference
-0.64
Heard
-0.64
selves
-0.63
ATHER
-0.63
Carmen
-0.63
Cind
-0.62
Bearing
-0.61
ORN
-0.60
Xander
-0.59
POSITIVE LOGITS
iors
1.52
egal
1.49
pai
1.47
seless
1.39
eca
1.31
escent
1.31
hov
1.10
ryu
1.07
esc
1.02
esis
0.99
Activations Density 0.030%