INDEX
Explanations
questions starting with "How does" or "How do"
New Auto-Interp
Negative Logits
Hug
-0.74
iken
-0.71
bsp
-0.70
tto
-0.69
gio
-0.68
hao
-0.68
Knight
-0.67
boa
-0.66
tails
-0.65
574
-0.65
POSITIVE LOGITS
reconcil
0.92
reconcile
0.79
?),
0.78
?]
0.77
coping
0.76
?
0.72
reacting
0.71
?!
0.70
reconciliation
0.69
?)
0.68
Activations Density 0.034%