INDEX
Explanations
references to science and scientific concepts
New Auto-Interp
Negative Logits
iser
-0.18
itre
-0.17
tra
-0.17
neath
-0.17
ted
-0.16
rescia
-0.16
/down
-0.16
lander
-0.16
ting
-0.15
lass
-0.15
POSITIVE LOGITS
/engine
0.29
/math
0.27
-fiction
0.25
/Math
0.21
ENCES
0.19
/stat
0.18
fiction
0.18
/art
0.18
y
0.18
emet
0.17
Activations Density 0.047%