INDEX
Explanations
instances of the word "same"
New Auto-Interp
Negative Logits
apas
-0.18
dale
-0.16
inous
-0.16
_ABI
-0.16
wear
-0.15
xn
-0.15
land
-0.14
egot
-0.14
opers
-0.14
onaut
-0.14
POSITIVE LOGITS
-sex
0.24
ucci
0.17
sterile
0.15
æł·
0.15
ymoon
0.15
ison
0.14
ediator
0.14
Ø·
0.14
-day
0.14
oftware
0.14
Activations Density 0.023%