INDEX
Explanations
phrases related to legal agreements and government actions
New Auto-Interp
Negative Logits
ados
-0.78
(#
-0.73
buster
-0.71
itars
-0.71
more
-0.71
without
-0.70
split
-0.70
followed
-0.70
according
-0.69
="#
-0.69
POSITIVE LOGITS
latter
1.45
slightest
1.42
aforementioned
1.42
same
1.34
likes
1.32
highest
1.28
dreaded
1.21
wearer
1.21
widest
1.20
ses
1.19
Activations Density 3.959%