INDEX
Explanations
studies or research-related phrases
references to research studies and their findings
New Auto-Interp
Negative Logits
Strongh
-0.75
Petty
-0.70
Loyal
-0.70
fty
-0.68
Redemption
-0.65
atra
-0.64
ibles
-0.63
Newt
-0.60
Sundays
-0.60
die
-0.60
POSITIVE LOGITS
studies
1.14
examining
1.08
evaluating
1.06
conducted
1.02
studying
1.00
confirming
0.98
estimating
0.97
correl
0.96
documenting
0.96
demonstrating
0.96
Activations Density 0.181%