INDEX
Explanations
references to crocodiles or similar reptiles in the text
New Auto-Interp
Negative Logits
uns
-0.17
ariat
-0.15
neh
-0.15
roz
-0.14
imoto
-0.14
urn
-0.14
rose
-0.14
ROWSER
-0.14
typings
-0.14
offee
-0.14
POSITIVE LOGITS
codile
0.27
cro
0.25
Cro
0.24
Cro
0.24
oked
0.21
croft
0.19
cro
0.19
issant
0.18
anje
0.18
enen
0.18
Activations Density 0.014%