INDEX
Explanations
mentions of different types of animals, specifically goats and donkeys, as well as specific names like "Paula"
references to goats and related terms
New Auto-Interp
Negative Logits
orneys
-0.78
ept
-0.69
cknowled
-0.68
Barnett
-0.67
hips
-0.67
aneously
-0.65
urrent
-0.65
stresses
-0.65
eering
-0.63
nant
-0.63
POSITIVE LOGITS
goat
1.13
goats
1.06
Cheese
1.05
Goat
1.03
asus
0.94
cheese
0.94
qv
0.83
wright
0.76
weed
0.75
ghan
0.72
Activations Density 0.007%