INDEX
Explanations
references to turtles and related words
New Auto-Interp
Negative Logits
nown
-0.08
roud
-0.07
nya
-0.07
ned
-0.07
esty
-0.07
thú
-0.07
_ONCE
-0.07
åłĤ
-0.07
teenth
-0.07
ROC
-0.07
POSITIVE LOGITS
adow
0.07
vin
0.07
ucker
0.06
-shell
0.06
ounds
0.06
ean
0.06
igidBody
0.06
swimming
0.06
otle
0.06
gram
0.06
Activations Density 0.005%