INDEX
Explanations
direct questions and inquiries
questions that begin with "if" or "whether."
New Auto-Interp
Negative Logits
abal
-0.90
alde
-0.77
ulic
-0.68
thodox
-0.66
nown
-0.65
Bonus
-0.65
astered
-0.64
*=-
-0.61
tc
-0.61
won
-0.60
POSITIVE LOGITS
ĻĤ
0.83
amera
0.83
forgiveness
0.82
curfew
0.71
permission
0.70
mosqu
0.70
ihad
0.68
displeasure
0.68
asking
0.67
watering
0.66
Activations Density 0.078%