INDEX
Explanations
words and phrases that suggest duality, contrast, or specific identity
New Auto-Interp
Negative Logits
sunrise
-0.15
atoon
-0.15
IGO
-0.14
Canter
-0.14
кÑĤÑĥ
-0.14
ãĥ¼ãĥij
-0.13
bote
-0.13
utherford
-0.13
occasion
-0.13
.quick
-0.13
POSITIVE LOGITS
ewan
0.16
æľ¬
0.15
edu
0.15
quina
0.15
hin
0.15
SG
0.14
uka
0.14
608
0.14
cluded
0.14
ixin
0.14
Activations Density 0.017%