INDEX
Explanations
phrases related to societal issues or challenges
references to government policies and social support systems
New Auto-Interp
Negative Logits
conjecture
-0.82
Prototype
-0.66
iple
-0.65
Readers
-0.65
ovember
-0.65
Shares
-0.60
(@
-0.60
rall
-0.59
confusion
-0.59
batches
-0.59
POSITIVE LOGITS
their
1.05
their
0.90
Their
0.81
they
0.80
they
0.79
THEIR
0.76
They
0.76
THEY
0.75
hower
0.74
reatment
0.74
Activations Density 1.282%