INDEX
Explanations
question marks
questions posed to the audience
New Auto-Interp
Negative Logits
bidden
-0.72
domest
-0.69
recovering
-0.69
nurs
-0.66
beginning
-0.64
acquisitions
-0.64
undai
-0.63
initialization
-0.63
fraternity
-0.61
manif
-0.61
POSITIVE LOGITS
Let
1.32
Leave
1.32
Tell
1.28
Discuss
1.26
Would
1.26
Share
1.22
Comment
1.20
Feel
1.20
Submit
1.20
Vote
1.19
Activations Density 0.100%