INDEX
Explanations
phrases related to the act of taking action
occurrences of the word "to"
New Auto-Interp
Negative Logits
Alleg
-0.59
laps
-0.59
Alive
-0.57
nevertheless
-0.56
disclosures
-0.56
Detected
-0.56
instit
-0.55
incurred
-0.55
indeed
-0.55
appropri
-0.55
POSITIVE LOGITS
wered
1.47
ggles
1.46
asted
1.46
pless
1.35
othy
1.30
ffee
1.27
asts
1.26
asty
1.21
psy
1.19
asters
1.17
Activations Density 0.128%