INDEX
Explanations
phrases involving statements or opinions
instances of the word "the."
New Auto-Interp
Negative Logits
contained
-0.75
ounces
-0.75
thood
-0.72
Ò
-0.70
perse
-0.69
imi
-0.69
resembling
-0.68
rehend
-0.67
Includes
-0.66
overseen
-0.66
POSITIVE LOGITS
oret
1.60
resa
1.32
downside
1.23
easiest
1.23
biggest
1.20
reason
1.19
simplest
1.17
irony
1.15
ories
1.14
problem
1.13
Activations Density 0.496%