INDEX
Explanations
specific proper nouns or significant names in the text
New Auto-Interp
Negative Logits
“
-0.63
‘
-0.56
aber
-0.53
des
-0.53
"
-0.51
-0.51
uris
-0.50
und
-0.49
ального
-0.47
-
-0.47
POSITIVE LOGITS
Efq
1.24
Monfieur
1.09
Majefty
1.07
՚
0.98
myſelf
0.97
ſelf
0.96
ſind
0.94
pleaſure
0.93
Reſ
0.92
Theſe
0.92
Activations Density 0.258%