INDEX
Explanations
words related to conflicts or disagreements, particularly in social and political contexts
New Auto-Interp
Negative Logits
etti
-0.20
etri
-0.14
edula
-0.14
âĨĴ↵↵
-0.14
sip
-0.14
tti
-0.14
BOSE
-0.14
esen
-0.14
ÙħÙĨد
-0.14
_ACTIVE
-0.14
POSITIVE LOGITS
anj
0.20
hol
0.17
ýt
0.16
aq
0.16
ats
0.16
ãĥ³ãĥĢ
0.15
δα
0.15
icom
0.15
Intermediate
0.15
els
0.15
Activations Density 0.018%