INDEX
Explanations
words or phrases that exhibit strong emotional reactions or significant implications, especially in political or social contexts
New Auto-Interp
Negative Logits
icult
-0.74
ording
-0.67
aimon
-0.66
oted
-0.65
iewicz
-0.65
oids
-0.65
assian
-0.64
utterstock
-0.64
icultural
-0.63
emonic
-0.62
POSITIVE LOGITS
vre
1.15
¬
0.99
lette
0.89
·
0.88
tis
0.85
sin
0.84
ÃįÃį
0.82
s
0.79
¹
0.79
¸
0.76
Activations Density 0.003%