INDEX
Explanations
references to familial and social relationships in narratives
New Auto-Interp
Negative Logits
alla
-0.16
.Quad
-0.15
å¨ĺ
-0.15
нÑİ
-0.14
="{!!-0.14
795
-0.14
sson
-0.14
rames
-0.14
529
-0.14
447
-0.14
POSITIVE LOGITS
gre
0.18
شر
0.17
ÙĪØ³Ø·
0.17
gre
0.15
NCY
0.15
achs
0.14
Byron
0.14
ppo
0.14
aires
0.13
Gre
0.13
Activations Density 0.395%