INDEX
Explanations
references to crocodiles
New Auto-Interp
Negative Logits
uns
-0.18
ariat
-0.17
neh
-0.16
KO
-0.15
ROWSER
-0.15
à¹Ģà¸ļ
-0.14
nov
-0.14
inema
-0.14
roz
-0.14
ardi
-0.14
POSITIVE LOGITS
cro
0.31
codile
0.29
Cro
0.29
Cro
0.28
oked
0.23
cro
0.23
croft
0.22
issant
0.22
chet
0.18
esor
0.17
Activations Density 0.009%