INDEX
Explanations
words related to destruction or annihilation
New Auto-Interp
Negative Logits
ounce
-0.18
же
-0.17
gren
-0.16
ırı
-0.16
ounces
-0.16
ange
-0.15
o
-0.15
ically
-0.15
enced
-0.15
ocuk
-0.15
POSITIVE LOGITS
ivers
0.24
ihilation
0.24
ulled
0.21
uity
0.21
exe
0.19
yang
0.18
s
0.18
yi
0.17
ointed
0.17
uni
0.16
Activations Density 0.007%