INDEX
Explanations
references to violent or harmful actions, often involving individuals or entities labeled as "killers"
references to "killer" in various contexts, especially relating to crime or wildlife
New Auto-Interp
Negative Logits
urn
-0.78
ional
-0.78
bles
-0.77
ational
-0.74
arb
-0.71
auri
-0.71
rity
-0.70
REF
-0.69
adr
-0.69
ourced
-0.69
POSITIVE LOGITS
killer
1.32
killers
1.13
killer
1.09
Killer
1.07
whales
0.84
knife
0.76
whale
0.75
killers
0.75
assassin
0.75
ipop
0.73
Activations Density 0.009%