INDEX
Explanations
mentions of submarines
references to submarines and submarine-related terminology
New Auto-Interp
Negative Logits
place
-0.79
cb
-0.73
grain
-0.73
Raw
-0.71
None
-0.71
Marketable
-0.70
giving
-0.70
ellen
-0.69
tg
-0.69
reads
-0.68
POSITIVE LOGITS
submarine
1.31
submarines
1.27
submar
1.22
marine
1.04
submer
0.96
torped
0.89
iltration
0.88
diving
0.88
boats
0.86
submerged
0.85
Activations Density 0.005%