INDEX
Explanations
Phrases expressing dualities or contrasts
concepts that involve duality or being twofold
New Auto-Interp
Negative Logits
sburgh
-0.74
ugu
-0.69
uez
-0.69
Volks
-0.65
Kard
-0.61
Century
-0.61
dq
-0.60
uffer
-0.59
Caption
-0.59
tions
-0.58
POSITIVE LOGITS
sexes
1.38
sides
1.15
halves
1.08
genders
1.07
thirds
0.72
extremes
0.70
imilar
0.70
ocating
0.68
ãĥīãĥ©ãĤ´ãĥ³
0.66
animate
0.64
Activations Density 0.060%