INDEX
Explanations
phrases indicating limitations or constraints
phrases that indicate a limitation or restriction
New Auto-Interp
Negative Logits
ult
-0.69
auga
-0.67
ameron
-0.67
robat
-0.63
ashington
-0.63
axis
-0.63
yssey
-0.63
umping
-0.61
itch
-0.61
loo
-0.60
POSITIVE LOGITS
spor
0.91
marginally
0.76
ONE
0.73
insofar
0.70
superficial
0.67
finite
0.67
sporadic
0.66
fraction
0.66
iffe
0.66
ifiable
0.66
Activations Density 0.400%