INDEX
Explanations
questions in a written format
New Auto-Interp
Negative Logits
rawdownloadcloneembedreportprint
-0.76
ords
-0.75
apsed
-0.69
alach
-0.67
lag
-0.64
broch
-0.62
ographed
-0.59
olves
-0.58
estial
-0.58
Proud
-0.58
POSITIVE LOGITS
why
1.38
WHY
1.35
why
1.25
Why
1.16
Why
1.11
how
1.08
what
1.07
whether
1.03
whether
1.02
WHERE
1.00
Activations Density 0.150%