INDEX
Explanations
references to specific locations or entities followed by an action or state
occurrences of the word "the" in various contexts
New Auto-Interp
Negative Logits
thood
-0.87
eed
-0.81
because
-0.78
Ò
-0.76
besides
-0.74
verage
-0.72
plete
-0.72
leeve
-0.71
tsy
-0.69
ago
-0.69
POSITIVE LOGITS
slightest
1.03
biggest
0.98
entire
0.96
majority
0.95
simplest
0.95
entirety
0.95
temptation
0.94
easiest
0.91
greatest
0.91
likelihood
0.90
Activations Density 0.279%