INDEX
Explanations
references to unspecified individuals or groups
New Auto-Interp
Negative Logits
sed
-0.16
ts
-0.16
wner
-0.15
aries
-0.15
lington
-0.14
ÙĤر
-0.14
ted
-0.14
endor
-0.14
tn
-0.13
uyen
-0.13
POSITIVE LOGITS
who
0.26
else
0.25
hood
0.20
who
0.19
age
0.18
Who
0.18
whom
0.17
_else
0.17
Who
0.17
/group
0.16
Activations Density 0.061%