INDEX
Explanations
critical or significant terms related to social and cultural identity
New Auto-Interp
Negative Logits
ant
-0.17
.ant
-0.17
tone
-0.17
Pact
-0.16
ergic
-0.15
Anton
-0.15
antine
-0.15
jev
-0.15
anton
-0.15
urr
-0.15
POSITIVE LOGITS
ÅĤaw
0.17
ierge
0.17
езда
0.16
ohn
0.16
ienie
0.16
inki
0.15
radient
0.15
lod
0.15
inker
0.14
è¹
0.14
Activations Density 0.019%