INDEX
Explanations
titles, particularly those with the word "The" in them, or words related to black people
New Auto-Interp
Negative Logits
<bos>
-1.11
RegressionTest
-0.66
')):
-0.65
"}>
-0.59
'):
-0.59
$")
-0.58
")){
-0.57
nemlig
-0.57
Guys
-0.57
'})
-0.57
POSITIVE LOGITS
Ruhm
0.62
joaat
0.60
энциклопедия
0.60
Peasant
0.59
EdgeInsets
0.59
EndContext
0.58
henvisninger
0.58
adultery
0.58
pertory
0.57
interrogation
0.56
Activations Density 1.463%