INDEX
Explanations
phrases related to scientific theories
references to various theories
New Auto-Interp
Negative Logits
---------
-0.67
awn
-0.65
Serv
-0.63
Lic
-0.61
unwanted
-0.61
Borders
-0.61
elcome
-0.60
Host
-0.60
Contact
-0.58
lete
-0.57
POSITIVE LOGITS
theory
3.84
Theory
2.85
theories
2.65
theorists
2.39
hypothesis
2.21
theorist
2.17
theoret
1.79
theor
1.75
hypotheses
1.62
theoretical
1.54
Activations Density 0.023%