INDEX
Explanations
conjunctions and phrases indicative of conditions or consequences
New Auto-Interp
Negative Logits
<bos>
-0.76
שְׁ
-0.66
brainly
-0.65
faſt
-0.65
Diony
-0.63
setId
-0.62
unauthorised
-0.61
∛
-0.61
ſur
-0.61
aryen
-0.61
POSITIVE LOGITS
,
1.02
.,
0.83
but
0.83
however
0.78
%,
0.78
(),
0.77
ViewFeatures
0.77
′,
0.77
,-,
0.77
*,
0.76
Activations Density 2.379%