INDEX
Explanations
questions starting with 'what about'
questions that begin with "What about."
New Auto-Interp
Negative Logits
KO
-0.83
obi
-0.73
¯¯¯¯¯¯¯¯
-0.73
Constructed
-0.72
cil
-0.71
proverb
-0.68
zar
-0.66
arm
-0.66
HT
-0.62
cycle
-0.62
POSITIVE LOGITS
!?
0.67
?]
0.66
illon
0.66
...?
0.64
fairness
0.64
ickets
0.64
really
0.63
yip
0.63
berra
0.62
bourg
0.60
Activations Density 0.015%