INDEX
Explanations
instances of the word "of" being followed by a number
the phrase "many of" to indicate a large group or collection
New Auto-Interp
Negative Logits
disadvant
-0.71
WATCHED
-0.68
destro
-0.68
FTWARE
-0.63
inconsistency
-0.58
bage
-0.57
artifacts
-0.56
leans
-0.55
confir
-0.55
mone
-0.54
POSITIVE LOGITS
us
1.33
our
0.92
those
0.90
them
0.89
today
0.89
America
0.87
these
0.87
Us
0.80
whom
0.79
ours
0.79
Activations Density 0.092%