INDEX
Explanations
phrases indicating exceptions or limitations in eligibility or conditions
New Auto-Interp
Negative Logits
esco
-0.16
undi
-0.16
emes
-0.15
hong
-0.15
uggage
-0.14
å°Ķ
-0.14
gest
-0.14
tương
-0.14
od
-0.14
zeug
-0.14
POSITIVE LOGITS
exclusive
0.32
limited
0.32
exclusive
0.27
limited
0.24
Exclusive
0.24
exclusively
0.24
exhausted
0.24
Exclusive
0.23
exhaustive
0.22
restricted
0.22
Activations Density 0.007%