INDEX
Explanations
references to specific animals, particularly foxes
the word "fox" and variations of it within the text
New Auto-Interp
Negative Logits
assy
-0.74
inval
-0.69
Definition
-0.65
apter
-0.65
Pradesh
-0.63
Edison
-0.63
rians
-0.63
atoon
-0.62
Scotia
-0.62
rogens
-0.61
POSITIVE LOGITS
conn
1.34
es
0.96
hound
0.92
hair
0.91
holes
0.90
y
0.89
hole
0.87
haw
0.86
fox
0.85
worthy
0.85
Activations Density 0.016%