INDEX
Explanations
scientific research-related terms and findings
statements about surprising scientific results and their implications
New Auto-Interp
Negative Logits
hating
-0.77
Exile
-0.76
Rated
-0.74
Theft
-0.70
Seasons
-0.70
Sovereign
-0.68
Destination
-0.67
Expend
-0.66
Municipal
-0.65
despise
-0.65
POSITIVE LOGITS
intriguing
1.23
hypothesis
1.13
breakthrough
1.12
implications
1.09
exciting
1.09
hypothes
1.08
tantal
1.08
hypotheses
1.07
promising
1.07
preliminary
1.06
Activations Density 0.424%