INDEX
Explanations
instances of the word "you" and its variations
New Auto-Interp
Negative Logits
unic
-0.16
enge
-0.16
oblin
-0.15
habi
-0.15
ÙĨدر
-0.15
خت
-0.15
Hanson
-0.14
late
-0.14
reet
-0.14
iste
-0.14
POSITIVE LOGITS
LOUR
0.14
QUEST
0.14
INCIDENTAL
0.14
令
0.13
arts
0.13
Miss
0.13
assumed
0.13
ιδ
0.13
idak
0.13
terra
0.12
Activations Density 0.023%