INDEX
Explanations
phrases indicating comparison or contrast
connections between individual and community experiences or actions
New Auto-Interp
Negative Logits
typo
-0.65
Gleaming
-0.64
hottest
-0.63
Ice
-0.63
estamp
-0.62
Bullets
-0.61
bright
-0.61
Clicker
-0.61
atical
-0.60
START
-0.60
POSITIVE LOGITS
collectively
1.11
collective
1.10
communal
1.00
group
1.00
jointly
0.97
group
0.92
grouped
0.91
andem
0.90
shared
0.89
others
0.88
Activations Density 0.183%