INDEX
Explanations
quantities or numbers
questions or statements that inquire about quantities
New Auto-Interp
Negative Logits
hern
-0.76
UAL
-0.72
afort
-0.70
olitics
-0.70
ivism
-0.67
Mobil
-0.66
etry
-0.66
avior
-0.64
owder
-0.64
aband
-0.63
POSITIVE LOGITS
times
1.23
thousand
0.96
calories
0.95
servings
0.93
hundred
0.91
instances
0.88
copies
0.88
hours
0.87
people
0.87
parentheses
0.87
Activations Density 0.045%