INDEX
Explanations
the word "most"
New Auto-Interp
Negative Logits
<bos>
-1.03
“
-0.68
-0.68
the
-0.66
<em>
-0.65
"
-0.64
-
-0.61
“
-0.61
'
-0.59
go
-0.58
POSITIVE LOGITS
Efq
1.42
myſelf
1.34
Anſ
1.29
Eſ
1.27
Majefty
1.23
Houſe
1.23
تقاوى
1.23
itſelf
1.22
Jefus
1.20
Reſ
1.20
Activations Density 0.367%