INDEX
Explanations
references to indeterminate or vague subjects and phrases
New Auto-Interp
Negative Logits
pa
-0.18
pe
-0.18
-0.17
neither
-0.17
py
-0.17
all
-0.17
pet
-0.17
often
-0.15
po
-0.15
zel
-0.15
POSITIVE LOGITS
else
0.28
_else
0.24
/e
0.22
else
0.21
-any
0.21
THING
0.18
Else
0.18
alysis
0.17
hazi
0.17
else
0.16
Activations Density 0.039%