INDEX
Explanations
terms related to legal and political concepts
New Auto-Interp
Negative Logits
belonging
-0.67
proceeding
-0.65
biting
-0.65
complying
-0.63
ruling
-0.61
wanting
-0.59
moving
-0.58
reading
-0.58
sticking
-0.58
remaining
-0.57
POSITIVE LOGITS
ize
1.19
ify
1.15
ulate
1.07
inate
1.04
cknow
1.03
semble
1.02
izes
1.01
itate
1.00
ceive
0.99
ulates
0.99
Activations Density 0.301%