INDEX
Explanations
questions that begin with "who."
New Auto-Interp
Negative Logits
mente
-0.19
ted
-0.17
illos
-0.15
erais
-0.15
tti
-0.15
taire
-0.15
ting
-0.15
Carthy
-0.15
ning
-0.14
uran
-0.14
POSITIVE LOGITS
else
0.24
osh
0.18
soever
0.17
_else
0.16
ops
0.15
oping
0.14
else
0.14
ÑĢей
0.14
inspace
0.14
ELSE
0.14
Activations Density 0.049%