INDEX
Explanations
explanations or clarifications in text
instances of the word "explained."
New Auto-Interp
Negative Logits
inal
-0.80
illet
-0.76
ILCS
-0.71
engeance
-0.70
venge
-0.69
pired
-0.69
vernment
-0.69
contracted
-0.66
otion
-0.66
ascus
-0.65
POSITIVE LOGITS
explains
1.01
why
0.95
WHY
0.90
explained
0.87
explain
0.83
ĸļ
0.83
explaining
0.82
WER
0.82
explanations
0.81
Explain
0.80
Activations Density 0.017%