INDEX
Explanations
questions starting with "Did"
questions that begin with "Did."
New Auto-Interp
Negative Logits
houses
-0.72
heter
-0.70
stood
-0.70
rooms
-0.69
isu
-0.68
objects
-0.64
cedented
-0.64
boards
-0.63
asant
-0.62
Methods
-0.62
POSITIVE LOGITS
actic
1.13
nt
0.73
act
0.72
iotic
0.72
riks
0.72
n
0.71
Finish
0.69
happen
0.68
IER
0.66
iosyncr
0.64
Activations Density 0.044%