INDEX
Explanations
mentions of marine animals specifically killer whales
references to killer whales
New Auto-Interp
Negative Logits
ational
-0.91
ional
-0.91
ourced
-0.88
ADRA
-0.87
edu
-0.86
ibly
-0.84
ured
-0.83
isse
-0.83
uration
-0.82
bles
-0.78
POSITIVE LOGITS
killer
1.00
whales
0.99
whale
0.87
killer
0.85
spree
0.84
instinct
0.84
knife
0.82
blow
0.76
killers
0.73
fish
0.72
Activations Density 0.036%