INDEX
Explanations
descriptions or mentions of scientific experiments
references to scientific experiments
New Auto-Interp
Negative Logits
headers
-0.67
cut
-0.62
veland
-0.62
clinton
-0.60
doms
-0.60
lake
-0.59
othy
-0.59
ritic
-0.59
flush
-0.59
miah
-0.58
POSITIVE LOGITS
imental
1.06
iments
1.05
iment
0.91
ally
0.88
Experiment
0.87
experiment
0.81
eers
0.80
ually
0.75
uates
0.75
onom
0.73
Activations Density 0.016%