INDEX
Explanations
proper nouns or names, particularly those of people
New Auto-Interp
Negative Logits
']],
-0.92
}}$}
-0.89
".
-0.86
'],
-0.85
)++;
-0.84
"]];
-0.84
')],
-0.83
."]
-0.82
))));
-0.82
'>";
-0.82
POSITIVE LOGITS
Jr
0.64
Przypisy
0.55
,
0.55
(
0.54
'
0.46
1
0.46
&
0.46
(
0.46
mann
0.44
b
0.44
Activations Density 0.375%