INDEX
Explanations
phrases related to scientific topics or concepts
references to scientific concepts and discussions
New Auto-Interp
Negative Logits
drops
-0.76
zik
-0.70
matched
-0.68
erous
-0.67
tower
-0.67
atra
-0.66
pora
-0.66
skirts
-0.65
capped
-0.65
ipop
-0.65
POSITIVE LOGITS
fiction
1.01
curiosity
0.97
literacy
0.93
misconduct
0.88
ĨĴ
0.87
research
0.86
inquiry
0.85
scientist
0.84
sciences
0.84
science
0.82
Activations Density 0.019%