INDEX
Explanations
phrases indicating the presence of a supporting party or source
New Auto-Interp
Negative Logits
الإنجليزية
-0.74
help
-0.69
GIP
-0.67
InstrumentedTest
-0.66
IsContent
-0.66
BSEB
-0.65
Bewußt
-0.63
ACI
-0.63
ťaž
-0.62
beaver
-0.62
POSITIVE LOGITS
the
0.89
an
0.78
a
0.69
another
0.61
with
0.59
Demikian
0.57
all
0.56
both
0.56
two
0.55
Berikut
0.54
Activations Density 0.316%