INDEX
Explanations
references to animals or animal-related terms
New Auto-Interp
Negative Logits
Compton
-0.77
heit
-0.76
unda
-0.72
lain
-0.72
landers
-0.70
ounced
-0.69
creen
-0.68
lining
-0.66
hower
-0.66
itudinal
-0.65
POSITIVE LOGITS
carc
0.99
kingdom
0.94
aclysm
0.89
cruelty
0.88
oreal
0.87
animals
0.87
arium
0.86
lover
0.83
welfare
0.83
mammals
0.82
Activations Density 0.037%