INDEX
Explanations
variations of the word "you"
New Auto-Interp
Negative Logits
¿½
-0.71
ipal
-0.67
Gamb
-0.61
ice
-0.60
airs
-0.60
Commerce
-0.59
Chap
-0.58
icy
-0.58
images
-0.57
ãĥ³ãĤ¸
-0.56
POSITIVE LOGITS
're
1.66
'll
1.38
've
1.37
guys
1.31
tub
1.24
'd
1.13
guessed
1.09
yourselves
1.03
know
1.00
RS
0.97
Activations Density 0.505%