INDEX
Explanations
magazine titles
references to magazines
New Auto-Interp
Negative Logits
acted
-0.75
speaking
-0.69
cker
-0.67
qqa
-0.66
Xi
-0.66
ensions
-0.64
haul
-0.64
zh
-0.63
æĪ¦
-0.63
adow
-0.63
POSITIVE LOGITS
azine
1.00
publisher
0.94
magazine
0.91
subscriptions
0.89
editor
0.89
publishers
0.88
magazines
0.87
azines
0.84
editors
0.81
covers
0.80
Activations Density 0.019%