INDEX
Explanations
references to financial values such as dollar amounts or percentages
the presence of the article "a" and its frequency in various contexts
New Auto-Interp
Negative Logits
organs
-0.76
artifacts
-0.69
documents
-0.68
things
-0.65
mot
-0.65
fights
-0.65
tattoos
-0.64
preparations
-0.64
eyes
-0.64
eg
-0.63
POSITIVE LOGITS
whopping
1.59
total
1.10
staggering
1.03
median
0.99
fraction
0.99
hefty
0.98
verages
0.97
ratio
0.96
mere
0.95
maximum
0.94
Activations Density 0.254%