INDEX
Explanations
indefinite articles followed by nouns
phrases that quantify or reference amounts
New Auto-Interp
Negative Logits
icism
-0.89
matter
-0.83
brance
-0.82
cation
-0.80
ileaks
-0.79
ernaut
-0.76
terness
-0.74
antry
-0.74
rack
-0.72
nesday
-0.72
POSITIVE LOGITS
constants
1.11
exceptions
1.04
types
1.04
indications
1.04
instances
0.94
representations
0.93
batches
0.93
clauses
0.93
roles
0.92
winners
0.92
Activations Density 0.277%