INDEX
Explanations
phrases emphasizing importance or significance
instances of the word "the."
New Auto-Interp
Negative Logits
existed
-0.74
exists
-0.71
include
-0.70
dale
-0.70
subscrib
-0.67
notes
-0.66
illion
-0.66
rift
-0.65
partake
-0.64
corresponds
-0.64
POSITIVE LOGITS
easiest
1.16
safest
1.06
culmination
1.05
simplest
1.04
same
1.01
understatement
0.95
smartest
0.94
greatest
0.92
toughest
0.91
cheapest
0.91
Activations Density 0.081%