INDEX
Explanations
mentions of large quantities of entities or activities
instances of the word "scores" followed by numerical values or references to quantities
New Auto-Interp
Negative Logits
Forge
-0.67
ned
-0.65
deduction
-0.63
ulkan
-0.62
dissolution
-0.62
UAL
-0.62
iator
-0.61
Correction
-0.60
necessity
-0.59
does
-0.59
POSITIVE LOGITS
paces
0.99
dozen
0.93
poons
0.93
imilar
0.90
thousand
0.88
dozen
0.83
omething
0.82
arnaev
0.80
everal
0.78
chool
0.78
Activations Density 0.016%