INDEX
Explanations
words related to legal rights and actions
the definite article "the"
New Auto-Interp
Negative Logits
thood
-0.79
iffe
-0.70
leeve
-0.59
suppose
-0.58
gat
-0.58
illon
-0.58
advertising
-0.55
den
-0.55
ius
-0.55
IDs
-0.54
POSITIVE LOGITS
ses
1.13
same
1.11
quickest
1.07
slightest
1.05
longest
1.04
hardest
1.04
fastest
1.02
entirety
0.97
way
0.97
entire
0.94
Activations Density 0.279%