INDEX
Explanations
mentions of names and titles, particularly in a context related to specific individuals
New Auto-Interp
Negative Logits
-
-0.68
"
-0.61
/
-0.60
“
-0.59
‘
-0.59
-
-0.58
-0.57
(
-0.57
'
-0.56
–
-0.54
POSITIVE LOGITS
Efq
1.17
Majefty
1.05
՚
1.05
^(@)
1.01
Monfieur
1.01
Theſe
0.99
myſelf
0.99
ſind
0.98
ſelves
0.97
propOrder
0.95
Activations Density 0.120%