INDEX
Explanations
words related to publications and blog posts
the end of text markers
New Auto-Interp
Negative Logits
Reef
-0.65
milo
-0.63
Allied
-0.60
stacks
-0.60
Lauder
-0.59
Osc
-0.58
baskets
-0.58
Samar
-0.57
Lama
-0.56
Patriarch
-0.56
POSITIVE LOGITS
ources
1.03
lightly
0.98
aved
0.93
atisf
0.90
ELF
0.89
ued
0.89
ushi
0.88
omew
0.87
urgical
0.86
pecially
0.85
Activations Density 0.092%