INDEX
Explanations
references to afternoon and after-school activities
New Auto-Interp
Negative Logits
ssp
-0.17
sg
-0.17
swire
-0.16
licken
-0.15
isay
-0.15
adiens
-0.15
swick
-0.15
296
-0.15
rough
-0.14
undo
-0.14
POSITIVE LOGITS
thought
0.23
gl
0.20
mentioned
0.20
noon
0.20
Glow
0.18
effects
0.18
ViewInit
0.18
math
0.17
dark
0.17
party
0.17
Activations Density 0.033%