INDEX
Explanations
words related to animals and pets
New Auto-Interp
Negative Logits
éĹĺ
-0.78
DERR
-0.76
unda
-0.73
artz
-0.73
IDER
-0.72
sclerosis
-0.70
WARN
-0.70
ONES
-0.69
seeded
-0.68
Shap
-0.68
POSITIVE LOGITS
ertodd
1.00
puppies
0.99
fights
0.94
dogs
0.94
barking
0.92
fight
0.90
heter
0.89
fighting
0.87
riages
0.87
pee
0.85
Activations Density 0.855%