INDEX
Explanations
references to science and its various disciplines
New Auto-Interp
Negative Logits
lass
-0.18
rescia
-0.17
loe
-0.17
iser
-0.17
tra
-0.16
/down
-0.16
neo
-0.16
ting
-0.16
ter
-0.15
lander
-0.15
POSITIVE LOGITS
-fiction
0.25
/math
0.24
/engine
0.24
ENCES
0.18
/Math
0.18
fiction
0.18
/art
0.18
fully
0.17
/stat
0.16
y
0.16
Activations Density 0.047%