INDEX
Explanations
first-person pronouns and references to personal experiences
New Auto-Interp
Negative Logits
themselves
-0.27
their
-0.20
Their
-0.20
Their
-0.19
their
-0.18
leurs
-0.17
holm
-0.17
reuse
-0.16
ij
-0.15
osi
-0.15
POSITIVE LOGITS
alike
0.17
ago
0.16
AO
0.15
onor
0.14
iena
0.14
edy
0.14
ags
0.14
zelf
0.14
eren
0.14
/plugin
0.13
Activations Density 0.087%