INDEX
Explanations
the word "You" at varying degrees of activation
the pronoun "You"
New Auto-Interp
Negative Logits
airs
-0.69
¿½
-0.61
Gamb
-0.60
stemming
-0.59
assembly
-0.59
srfAttach
-0.59
assemb
-0.58
Cornwall
-0.57
shore
-0.55
wrapper
-0.55
POSITIVE LOGITS
're
1.42
'll
1.24
've
1.23
guys
1.13
'd
1.04
tub
1.03
guessed
0.98
imar
0.97
ngth
0.95
ths
0.92
Activations Density 0.139%