INDEX
Explanations
references to sharks in various contexts, including attacks, movies, and metaphors
references to sharks
New Auto-Interp
Negative Logits
ories
-0.80
ISTER
-0.76
haar
-0.71
ISION
-0.70
Winchester
-0.69
mble
-0.67
dit
-0.67
onsense
-0.66
ijah
-0.65
Hemp
-0.63
POSITIVE LOGITS
fins
0.99
Sharks
0.92
sharks
0.88
ulic
0.86
mong
0.85
fish
0.83
shark
0.82
vati
0.82
Shark
0.81
affe
0.79
Activations Density 0.013%