INDEX
Explanations
terms related to policies, actions, or outcomes
New Auto-Interp
Negative Logits
è¦
-0.78
Entered
-0.69
icipated
-0.67
Cosponsors
-0.63
awaru
-0.62
76561
-0.62
Joined
-0.62
utm
-0.61
ijah
-0.61
started
-0.61
POSITIVE LOGITS
obsolete
1.26
easier
1.20
safer
1.14
seem
1.10
redundant
1.09
impossible
1.06
irrelevant
1.03
attractive
1.01
unus
1.00
harder
0.99
Activations Density 0.134%