INDEX
Explanations
references to individuals taking action or making decisions
New Auto-Interp
Negative Logits
YD
-0.15
filmy
-0.15
plx
-0.14
rimp
-0.14
acades
-0.14
Cassidy
-0.14
ëŀ
-0.14
id
-0.14
اÙĦات
-0.13
urf
-0.13
POSITIVE LOGITS
iesen
0.16
Baron
0.16
yleft
0.16
ensch
0.15
amac
0.15
ownt
0.14
zahl
0.14
ipt
0.14
iesta
0.14
ahren
0.14
Activations Density 0.159%