INDEX
Explanations
the repeated use of the word "And"
New Auto-Interp
Negative Logits
"");
-0.68
chi̍t
-0.67
'),
-0.66
lenker
-0.64
"]
-0.63
"
-0.62
')
-0.61
)";
-0.61
');
-0.61
”]
-0.61
POSITIVE LOGITS
And
3.09
And
2.77
AND
1.44
Or
1.26
Και
1.09
Or
1.07
Và
1.06
But
1.01
Και
0.98
Of
0.95
Activations Density 0.069%