INDEX
Explanations
references to individuals or groups in general terms
New Auto-Interp
Negative Logits
χ
-0.75
δ
-0.67
SK
-0.66
δ
-0.65
Mot
-0.64
Mot
-0.63
d
-0.63
tas
-0.62
Δ
-0.62
TagHelpers
-0.62
POSITIVE LOGITS
Nadie
1.44
anyone
1.35
everyone
1.34
everybody
1.33
nobody
1.29
Everyone
1.29
Everyone
1.29
perſon
1.28
Everybody
1.28
someone
1.26
Activations Density 0.048%