INDEX
Explanations
phrases denoting a large quantity or number of something
quantifiers or terms indicating frequency and quantity
New Auto-Interp
Negative Logits
istan
-0.69
ESCO
-0.67
Constructed
-0.66
nature
-0.65
axis
-0.60
cause
-0.58
idden
-0.58
KE
-0.58
idd
-0.57
SR
-0.56
POSITIVE LOGITS
incrim
0.77
valuable
0.76
additional
0.75
amounts
0.72
meaningful
0.71
useful
0.70
interesting
0.69
worthwhile
0.69
impressive
0.69
amount
0.68
Activations Density 0.341%