INDEX
Explanations
instances where something is described as having more or less of a particular quality or characteristic
phrases emphasizing the concept of "more of" something
New Auto-Interp
Negative Logits
alez
-0.65
ynes
-0.64
yrs
-0.62
zai
-0.62
overe
-0.62
TOP
-0.59
livest
-0.59
withd
-0.59
months
-0.57
uden
-0.56
POSITIVE LOGITS
course
0.86
course
0.85
an
0.85
erous
0.85
them
0.80
a
0.76
what
0.74
ourselves
0.73
us
0.72
those
0.72
Activations Density 0.075%