INDEX
Explanations
references to the concept of science across various contexts
New Auto-Interp
Negative Logits
tra
-0.19
neo
-0.17
INGS
-0.17
neath
-0.17
ted
-0.16
lands
-0.16
rias
-0.16
ter
-0.15
isma
-0.15
acher
-0.15
POSITIVE LOGITS
-fiction
0.36
fiction
0.34
Fiction
0.30
/math
0.27
fiction
0.25
fictional
0.24
/engine
0.24
/art
0.21
fair
0.19
/Math
0.19
Activations Density 0.036%