INDEX
Explanations
phrases indicating collective responsibility or action
collective phrases emphasizing communal experiences and needs
New Auto-Interp
Negative Logits
luster
-0.63
arresting
-0.59
limit
-0.59
exclusion
-0.58
ahime
-0.58
FN
-0.57
LR
-0.55
none
-0.54
promotion
-0.54
flation
-0.54
POSITIVE LOGITS
alike
1.03
ocating
0.99
agree
0.89
know
0.88
uding
0.84
kinds
0.83
owe
0.80
remember
0.77
know
0.76
knew
0.76
Activations Density 0.034%