INDEX
Explanations
references to specific characters and relationships in narratives
New Auto-Interp
Negative Logits
èle
-0.18
865
-0.16
227
-0.16
Abrams
-0.14
achat
-0.14
ron
-0.14
795
-0.14
roje
-0.14
Leonard
-0.14
gary
-0.14
POSITIVE LOGITS
صÙĩ
0.14
lassen
0.14
auc
0.14
ниÑĨÑĥ
0.14
supern
0.14
/column
0.13
ิว
0.13
ereotype
0.13
enk
0.13
eut
0.13
Activations Density 0.305%