INDEX
Explanations
references to political figures or events, particularly related to security and controversies
New Auto-Interp
Negative Logits
ende
-0.72
incorpor
-0.71
ponder
-0.71
mushroom
-0.69
imagination
-0.68
controvers
-0.65
retrieval
-0.65
rundown
-0.64
electorate
-0.64
unwanted
-0.64
POSITIVE LOGITS
ï¸ı
1.23
ÃĽ
0.96
ï¸
0.93
¯
0.92
STEM
0.88
cue
0.86
°
0.84
said
0.83
âĢł
0.80
#$
0.80
Activations Density 0.795%