INDEX
Explanations
questions or uncertainties expressed through the term "what."
New Auto-Interp
Negative Logits
Efq
-0.80
تضيفلها
-0.78
saites
-0.67
ſelves
-0.65
myſelf
-0.65
itſelf
-0.64
ValueStyle
-0.63
Jefus
-0.62
shalt
-0.62
ValueGenerated
-0.61
POSITIVE LOGITS
the
0.60
it
0.59
he
0.55
وتسجيلات
0.55
those
0.54
kind
0.53
exactly
0.53
exactly
0.51
to
0.50
soort
0.49
Activations Density 0.120%