INDEX
Explanations
questions starting with 'If' and statements involving thoughts or opinions
conditional statements and hypotheses
New Auto-Interp
Negative Logits
ģĸ
-0.76
illation
-0.72
peror
-0.69
apult
-0.65
drawn
-0.64
ngth
-0.64
isitions
-0.61
angering
-0.61
ptoms
-0.61
apan
-0.60
POSITIVE LOGITS
yes
1.27
so
1.12
Nope
0.96
not
0.96
YES
0.92
it
0.85
thats
0.83
Yes
0.82
Yes
0.82
NOT
0.82
Activations Density 0.163%