INDEX
Explanations
instances of the word "shark"
references to sharks
New Auto-Interp
Negative Logits
ISTER
-0.82
ories
-0.76
Hemp
-0.72
ISION
-0.67
haar
-0.66
ijah
-0.64
ndra
-0.64
onsense
-0.64
mble
-0.63
icably
-0.63
POSITIVE LOGITS
ulic
0.96
fins
0.88
fish
0.84
vati
0.82
izont
0.80
sharks
0.80
mong
0.80
iform
0.78
bite
0.77
Sharks
0.77
Activations Density 0.018%