INDEX
Explanations
phrases related to statements and explanations made by individuals
statements attributed to individuals within a dialogue or reporting context
New Auto-Interp
Negative Logits
pee
-0.65
monary
-0.61
central
-0.60
VIS
-0.58
pd
-0.57
tail
-0.57
neau
-0.57
amac
-0.57
brance
-0.57
ible
-0.56
POSITIVE LOGITS
:]
0.83
weet
0.69
"[
0.67
aturdays
0.66
auga
0.66
:"
0.66
:'
0.64
bluntly
0.63
goodbye
0.62
utenberg
0.62
Activations Density 0.244%