INDEX
Explanations
references to leadership roles and authority figures
New Auto-Interp
Negative Logits
elpers
-0.17
onen
-0.15
-dashboard
-0.15
arel
-0.15
aired
-0.15
uset
-0.14
ndo
-0.14
dash
-0.14
ük
-0.14
yers
-0.14
POSITIVE LOGITS
cl
0.16
reon
0.14
Bottom
0.14
isLoggedIn
0.13
omain
0.13
excell
0.13
بخ
0.13
ibble
0.13
ount
0.13
æŀ
0.13
Activations Density 0.140%