INDEX
Explanations
references to animals, specifically sheep
references to sheep and related terms
New Auto-Interp
Negative Logits
entric
-0.71
ENTS
-0.69
arcity
-0.69
validity
-0.68
GOODMAN
-0.66
eminent
-0.66
vehement
-0.65
devast
-0.64
undue
-0.61
rylic
-0.61
POSITIVE LOGITS
ishly
1.02
sheep
1.00
dog
0.92
dogs
0.90
skin
0.89
itte
0.85
poke
0.84
pei
0.79
meat
0.79
erness
0.77
Activations Density 0.005%