INDEX
Explanations
references to the word "fox" in various contexts
references to foxes
New Auto-Interp
Negative Logits
Effective
-0.88
Rated
-0.76
itutional
-0.71
ETH
-0.71
affer
-0.67
Interstitial
-0.66
NT
-0.66
igious
-0.66
apter
-0.63
FINE
-0.63
POSITIVE LOGITS
es
1.02
fox
0.86
squirrel
0.86
bat
0.82
manship
0.82
esy
0.81
hound
0.81
sey
0.76
fox
0.76
bench
0.75
Activations Density 0.018%