INDEX
Explanations
specific keywords related to politics or specific entities
references to social and political issues
New Auto-Interp
Negative Logits
rals
-0.73
ousands
-0.70
Flavoring
-0.64
Cosponsors
-0.63
rils
-0.62
juries
-0.59
ãĥł
-0.59
osures
-0.59
leys
-0.58
ructure
-0.58
POSITIVE LOGITS
speak
1.23
territory
1.06
incarn
0.96
jargon
0.95
shorthand
0.94
meets
0.92
slang
0.91
actly
0.84
heaven
0.84
reborn
0.83
Activations Density 0.408%