INDEX
Explanations
scientific terms and concepts
references to scientific concepts and terms
New Auto-Interp
Negative Logits
drops
-0.75
torn
-0.70
skirts
-0.69
oning
-0.68
eu
-0.67
pora
-0.66
matched
-0.65
eper
-0.64
erous
-0.64
tower
-0.64
POSITIVE LOGITS
fiction
1.02
curiosity
0.97
literacy
0.93
ĨĴ
0.89
scientist
0.88
research
0.86
physicist
0.86
breakthrough
0.86
misconduct
0.86
inquiry
0.85
Activations Density 0.021%