INDEX
Explanations
questions starting with the word "What"
questions beginning with "What."
New Auto-Interp
Negative Logits
ped
-0.67
por
-0.65
roads
-0.64
eer
-0.63
rod
-0.62
lot
-0.62
ulic
-0.61
gal
-0.61
Gy
-0.61
uttering
-0.60
POSITIVE LOGITS
soever
1.29
happens
1.11
happened
1.03
distinguishes
0.94
transpired
0.94
happ
0.92
kinds
0.87
Lies
0.85
sorts
0.84
else
0.84
Activations Density 0.092%