INDEX
Explanations
instances of dialogue and expressions of inquiry or explanation
New Auto-Interp
Negative Logits
olis
-0.15
nga
-0.15
verbatim
-0.13
ild
-0.13
keh
-0.13
нина
-0.12
echo
-0.12
ora
-0.12
è¯Ń
-0.12
233
-0.12
POSITIVE LOGITS
explain
0.74
explanation
0.73
explains
0.68
explaining
0.67
expl
0.67
explanations
0.67
explained
0.66
Expl
0.64
Explain
0.62
explain
0.62
Activations Density 0.263%