INDEX
Explanations
instances of the word "turn" and its variations
New Auto-Interp
Negative Logits
dro
-0.16
zer
-0.16
oop
-0.15
layıcı
-0.14
yers
-0.14
lum
-0.14
rone
-0.14
_Inter
-0.14
ually
-0.14
layan
-0.14
POSITIVE LOGITS
pike
0.35
stile
0.29
tables
0.25
ips
0.22
moil
0.22
igy
0.21
tabl
0.21
iej
0.20
about
0.20
itin
0.20
Activations Density 0.011%