INDEX
Explanations
words related to research findings or outcomes
instances of the word "results."
New Auto-Interp
Negative Logits
horn
-0.75
remember
-0.70
let
-0.69
ho
-0.67
audi
-0.67
abel
-0.66
haw
-0.64
irk
-0.64
evangel
-0.63
crou
-0.63
POSITIVE LOGITS
results
3.74
results
3.00
Results
2.71
result
2.41
Results
2.16
findings
1.93
result
1.92
outcomes
1.80
Result
1.69
outcome
1.67
Activations Density 0.012%