INDEX
Explanations
instances of high-frequency phrases or elements within text
New Auto-Interp
Negative Logits
onne
-0.15
ather
-0.14
oly
-0.14
uning
-0.14
¢
-0.14
elta
-0.14
McL
-0.14
peria
-0.14
grep
-0.14
hatt
-0.14
POSITIVE LOGITS
Vad
0.16
895
0.16
ogram
0.15
rix
0.14
FAG
0.14
tslib
0.14
roph
0.14
RPM
0.14
.ix
0.14
nish
0.14
Activations Density 0.004%