INDEX
Explanations
references to animal welfare and ethical treatment of animals in the context of production and consumption
New Auto-Interp
Negative Logits
iskey
-0.17
viper
-0.17
marshall
-0.15
onis
-0.15
osten
-0.15
úb
-0.15
nis
-0.14
511
-0.14
ibri
-0.14
SYM
-0.14
POSITIVE LOGITS
cruelty
0.32
animal
0.32
Animal
0.30
Cru
0.28
Animal
0.27
animal
0.26
animals
0.24
cru
0.22
humane
0.22
cruel
0.20
Activations Density 0.107%