INDEX
Explanations
phrases related to explaining or confirming information
instances of the word "to" indicating explanations or directives
New Auto-Interp
Negative Logits
downtime
-0.69
distractions
-0.64
botched
-0.61
wasted
-0.59
mobility
-0.59
coupled
-0.57
exposures
-0.56
pitted
-0.56
wasting
-0.56
simultane
-0.56
POSITIVE LOGITS
wered
1.12
ilet
1.09
pless
0.99
ggles
0.97
othy
0.96
ads
0.89
satisfy
0.86
asted
0.86
adies
0.85
psy
0.84
Activations Density 0.273%