INDEX
Explanations
questions that seek to understand reasons or motivations
New Auto-Interp
Negative Logits
$
-0.72
intahan
-0.70
Cortes
-0.70
")[
-0.67
Manning
-0.66
metallo
-0.64
Lass
-0.64
CGSize
-0.64
DEP
-0.63
ovatel
-0.63
POSITIVE LOGITS
why
1.75
why
1.68
Whyte
1.61
Why
1.58
Why
1.55
WHY
1.49
WHY
1.48
Waarom
1.42
Warum
1.40
varför
1.34
Activations Density 0.059%