INDEX
Explanations
instances of the word "both" and phrases indicating duality or partnership
New Auto-Interp
Negative Logits
difference
-0.16
all
-0.16
onest
-0.15
-ÑĤо
-0.15
Difference
-0.15
difference
-0.15
onth
-0.15
Gors
-0.14
aybe
-0.14
Difference
-0.14
POSITIVE LOGITS
åĮ
0.16
omanip
0.16
ế
0.16
illery
0.16
rem
0.14
amber
0.14
haul
0.14
087
0.14
adow
0.14
086
0.14
Activations Density 0.150%