INDEX
Explanations
terms related to transition and transformation concepts
New Auto-Interp
Negative Logits
ices
-0.17
oop
-0.16
ong
-0.15
adius
-0.15
ows
-0.14
ÑĤаÑĢ
-0.14
strap
-0.14
ouch
-0.14
atsby
-0.14
ÑĢай
-0.14
POSITIVE LOGITS
/trans
0.29
aksi
0.27
ylvania
0.22
-trans
0.20
ylv
0.20
.trans
0.18
verse
0.18
sexual
0.18
mere
0.17
(trans
0.17
Activations Density 0.041%