INDEX
Explanations
phrases that express positive sentiment or approval
instances of the word "the," indicating a focus on definite references or common nouns in the text
New Auto-Interp
Negative Logits
according
-0.74
imi
-0.73
inel
-0.71
thereby
-0.69
Ò
-0.69
voluntarily
-0.68
âĦ¢:
-0.68
âĢº
-0.67
âĢł
-0.67
ashington
-0.66
POSITIVE LOGITS
oret
1.50
easiest
1.26
simplest
1.24
downside
1.21
biggest
1.21
slightest
1.13
resa
1.10
hardest
1.08
brightest
1.08
quickest
1.07
Activations Density 0.573%