INDEX
Explanations
phrases related to groups of people or general statements about individuals
New Auto-Interp
Negative Logits
tnc
-0.79
iger
-0.75
ult
-0.69
ories
-0.65
urations
-0.64
Resurrection
-0.63
urer
-0.61
FU
-0.60
urrection
-0.60
QL
-0.60
POSITIVE LOGITS
else
2.01
Else
1.42
else
1.40
Else
1.30
who
1.04
knows
0.93
acquainted
0.90
involved
0.84
remembers
0.82
alike
0.82
Activations Density 0.431%