INDEX
Explanations
references to positions of authority and leadership
New Auto-Interp
Negative Logits
ŀæĢ§
-0.15
Swift
-0.15
swift
-0.14
assa
-0.14
ÎĶε
-0.13
unce
-0.13
ÅĽnie
-0.13
carr
-0.13
urer
-0.13
115
-0.13
POSITIVE LOGITS
azer
0.16
ookie
0.15
Beth
0.14
ex
0.14
lef
0.14
John
0.13
Grant
0.13
metam
0.13
ẩm
0.13
Ùĥس
0.13
Activations Density 0.337%