INDEX
Explanations
the occurrence of the word "you" in various contexts
New Auto-Interp
Negative Logits
lover
-0.16
ower
-0.15
board
-0.15
ald
-0.15
mus
-0.15
ict
-0.15
imit
-0.14
igate
-0.14
helper
-0.14
idious
-0.14
POSITIVE LOGITS
erdale
0.16
ehir
0.16
’ll
0.16
'll
0.15
OMUX
0.15
SELF
0.15
erer
0.15
ebek
0.15
oldemort
0.14
imli
0.14
Activations Density 0.102%