INDEX
Explanations
references to facts or statements being presented
the presence of forms of the verb "to be."
New Auto-Interp
Negative Logits
¢
-0.62
inav
-0.62
ogether
-0.59
aturdays
-0.59
congr
-0.58
oples
-0.57
summarizes
-0.56
iew
-0.55
Jol
-0.53
urry
-0.53
POSITIVE LOGITS
nt
0.93
NOT
0.82
incapable
0.78
actually
0.78
liable
0.77
disproportionately
0.75
not
0.75
VERY
0.74
unable
0.73
somehow
0.73
Activations Density 0.721%