INDEX
Explanations
mentions of harbors
references to harbors or similar locations
New Auto-Interp
Negative Logits
eds
-0.81
cious
-0.78
iod
-0.78
millenn
-0.71
TPPStreamerBot
-0.71
LM
-0.68
ener
-0.68
andr
-0.66
ammy
-0.64
yson
-0.63
POSITIVE LOGITS
harbor
1.23
harbour
1.11
bors
0.97
harb
0.95
lot
0.82
cove
0.79
Spit
0.74
Harbor
0.73
lash
0.72
rait
0.70
Activations Density 0.009%