INDEX
Explanations
references to interventions or processes in academic texts
New Auto-Interp
Negative Logits
}}$}
-1.17
rungsseite
-1.15
)");
-1.15
."));
-1.07
LookAnd
-1.05
)"),
-1.02
")));
-1.02
Италијани
-1.00
'}),
-0.98
".
-0.95
POSITIVE LOGITS
—
0.84
-
0.78
–
0.71
—
0.71
\
0.71
[
0.68
--
0.67
|
0.66
[
0.65
.—
0.64
Activations Density 0.233%