INDEX
Explanations
phrases indicating quantity or frequency
adjectives describing different characteristics or states
New Auto-Interp
Negative Logits
alion
-0.64
icism
-0.63
culosis
-0.63
bara
-0.62
CRE
-0.62
erity
-0.59
APTER
-0.58
EMA
-0.57
Barrett
-0.56
cation
-0.56
POSITIVE LOGITS
themselves
0.89
interchangeable
0.79
abouts
0.78
selves
0.72
substitutes
0.70
equivalents
0.70
extensions
0.69
types
0.69
outl
0.68
exceptions
0.66
Activations Density 0.924%