INDEX
Explanations
phrases related to experiments or trials
references to experimental conditions or procedures
New Auto-Interp
Negative Logits
andra
-0.84
HCR
-0.79
olulu
-0.76
atra
-0.73
Words
-0.71
veland
-0.71
ACP
-0.71
cript
-0.70
claimer
-0.68
say
-0.68
POSITIVE LOGITS
imental
1.10
Experimental
0.81
experimental
0.79
ists
0.76
manip
0.75
laboratory
0.75
findings
0.75
experiments
0.74
explor
0.71
Prototype
0.70
Activations Density 0.011%