INDEX
Explanations
mentions of the letter "A" with high activation values
the letter 'A' in various contexts
New Auto-Interp
Negative Logits
Vaugh
-0.67
enegger
-0.65
Alph
-0.63
Indigo
-0.60
reports
-0.60
-0.59
Nicaragua
-0.58
optics
-0.57
Everton
-0.57
Param
-0.57
POSITIVE LOGITS
cknowled
1.19
verages
1.18
ussie
1.17
cknow
1.11
uctions
1.10
irst
1.01
lyss
0.98
roma
0.98
ryan
0.96
perture
0.94
Activations Density 0.105%