INDEX
Explanations
mentions of high-ranking individuals, particularly those mentioned as "senior"
references to individuals in senior positions or roles
New Auto-Interp
Negative Logits
Ĥİ
-0.77
doors
-0.72
pel
-0.71
Hispan
-0.70
ãĥķãĤ©
-0.67
AUT
-0.66
plex
-0.64
igor
-0.64
laws
-0.63
aster
-0.62
POSITIVE LOGITS
citiz
0.95
iating
0.90
aide
0.87
iors
0.85
iate
0.85
ially
0.83
iates
0.81
doms
0.78
ity
0.78
adviser
0.75
Activations Density 0.015%