INDEX
Explanations
references to specific authors and their work in academic papers
Initials followed by a period or comma
authors' initials and names
New Auto-Interp
Negative Logits
]`
-1.00
^(@)
-0.98
autorytatywna
-0.96
$")
-0.95
}}$}
-0.94
.")]
-0.94
>\<^
-0.94
%");
-0.93
%")
-0.93
"]}
-0.92
POSITIVE LOGITS
stesso
0.68
himself
0.64
,
0.63
.
0.53
stessa
0.53
.,
0.51
Himself
0.50
stessi
0.50
(
0.45
'
0.44
Activations Density 0.205%