INDEX
Explanations
scientific terminology or references
mentions of "science."
New Auto-Interp
Negative Logits
STATES
-0.64
terior
-0.63
oser
-0.62
LOAD
-0.61
theless
-0.61
ESH
-0.61
ription
-0.61
âĢ¢âĢ¢
-0.60
raine
-0.60
leased
-0.60
POSITIVE LOGITS
Fiction
1.27
fiction
1.23
craft
0.95
icist
0.91
fiction
0.90
mong
0.83
literacy
0.80
sonian
0.76
lab
0.76
istries
0.75
Activations Density 0.032%