INDEX
Explanations
instances of the letter 'e' in varying frequencies
New Auto-Interp
Negative Logits
Anſ
-0.73
Houſe
-0.71
pleaſure
-0.67
Reſ
-0.66
ſelf
-0.65
ScopeManager
-0.65
bershka
-0.64
Conſ
-0.63
Majefty
-0.63
myſelf
-0.63
POSITIVE LOGITS
<bos>
0.57
E
0.57
e
0.56
ggf
0.51
g
0.51
i
0.50
ce
0.49
k
0.49
cf
0.48
E
0.47
Activations Density 0.022%