INDEX
Explanations
words related to politics, government positions, and legal matters, potentially focusing on controversies or scandals
fragments containing references to individuals, places, and events
New Auto-Interp
Negative Logits
Amon
-0.87
attach
-0.80
arrang
-0.79
ATOR
-0.77
AT
-0.74
ACY
-0.74
Amp
-0.73
asin
-0.72
awa
-0.71
ammy
-0.71
POSITIVE LOGITS
stein
0.94
tein
0.91
pedia
0.86
wyn
0.84
ria
0.82
rian
0.81
edia
0.81
´
0.80
rington
0.80
ri
0.79
Activations Density 0.417%