INDEX
Explanations
short phrases indicating communication or action
punctuation and specific short phrases within sentences
New Auto-Interp
Negative Logits
refinement
-0.77
flattened
-0.71
blended
-0.69
collisions
-0.68
churn
-0.67
scattered
-0.65
eren
-0.65
athered
-0.64
frying
-0.62
accumulated
-0.62
POSITIVE LOGITS
Otherwise
0.93
ASAP
0.90
Currently
0.84
Would
0.82
ļéĨĴ
0.81
Specifically
0.79
Allows
0.76
Please
0.76
Ideally
0.76
Otherwise
0.75
Activations Density 0.712%