INDEX
Explanations
mentions of the word "science" in various contexts
New Auto-Interp
Negative Logits
INGS
-0.18
га
-0.17
malink
-0.17
rone
-0.16
udy
-0.16
боÑĢ
-0.15
Sciences
-0.15
ÑĮÑİÑĤ
-0.15
undry
-0.15
ivr
-0.15
POSITIVE LOGITS
fiction
0.33
-fiction
0.30
Fiction
0.28
fiction
0.23
fictional
0.20
fair
0.19
/math
0.18
Fair
0.17
fict
0.17
-policy
0.17
Activations Density 0.021%