INDEX
Explanations
references to people or groups impacted by events or situations
New Auto-Interp
Negative Logits
dge
-0.16
rib
-0.16
il
-0.16
usk
-0.15
oyer
-0.15
oad
-0.15
dafür
-0.14
à¸ĵ
-0.14
thon
-0.14
tslib
-0.14
POSITIVE LOGITS
directly
0.29
direct
0.22
Direct
0.21
Direct
0.21
direct
0.19
indirect
0.19
irect
0.19
缴æİ¥
0.19
DIRECT
0.18
indirectly
0.17
Activations Density 0.044%