INDEX
Explanations
phrases that emphasize inclusivity and universality of experiences
references to collective or inclusive terminology
New Auto-Interp
Negative Logits
Wrath
-0.67
veyard
-0.67
iger
-0.67
word
-0.64
tnc
-0.63
odan
-0.63
eln
-0.63
testament
-0.62
Horses
-0.61
Resurrection
-0.61
POSITIVE LOGITS
else
1.91
Else
1.30
else
1.21
except
1.08
involved
1.07
Else
1.06
imaginable
1.00
who
0.95
THING
0.87
conceivable
0.85
Activations Density 0.060%