INDEX
Explanations
mentions of specific organizations or entities
New Auto-Interp
Negative Logits
ãĤ¨ãĥ«
-0.82
é¾įå¥ij士
-0.71
¶æ
-0.69
cit
-0.67
é¾įå
-0.67
Ĭ±
-0.67
ãĤ¤ãĥĪ
-0.65
Herm
-0.65
ranging
-0.64
taining
-0.64
POSITIVE LOGITS
recognizes
1.06
reacted
1.03
has
1.00
considers
1.00
wants
0.96
intervened
0.95
expects
0.95
insists
0.94
encourages
0.94
responded
0.93
Activations Density 0.381%