INDEX
Explanations
questions starting with "Why do" or "Why are"
questions and inquiries about motivations and reasons behind actions
New Auto-Interp
Negative Logits
iHUD
-0.83
abase
-0.74
orage
-0.74
yssey
-0.72
opian
-0.71
orge
-0.70
aukee
-0.69
orthy
-0.65
ibaba
-0.65
Pwr
-0.64
POSITIVE LOGITS
?]
0.84
nobody
0.80
everyone
0.80
so
0.80
people
0.79
SO
0.74
everyone
0.73
liberals
0.73
everybody
0.72
?".
0.71
Activations Density 0.062%