INDEX
Explanations
references to people, particularly those involved in various roles and contexts
New Auto-Interp
Negative Logits
ä¹ĭä¸Ģ
-0.15
oder
-0.14
áce
-0.13
unts
-0.13
ügen
-0.13
Ïģη
-0.13
clud
-0.13
ager
-0.13
embros
-0.13
gether
-0.13
POSITIVE LOGITS
with
0.23
who
0.22
without
0.19
everywhere
0.19
with
0.18
whose
0.18
nÃło
0.17
-first
0.16
without
0.16
who
0.16
Activations Density 0.310%