INDEX
Explanations
direct references to the word "you" in various contexts
New Auto-Interp
Negative Logits
Uncomment
-0.15
rawer
-0.15
usher
-0.15
ogui
-0.14
yny
-0.14
вдÑĢÑĥг
-0.14
onom
-0.14
asser
-0.14
numberWith
-0.14
arp
-0.14
POSITIVE LOGITS
forgot
0.19
said
0.19
seem
0.19
seems
0.17
mention
0.17
mentioned
0.17
mileage
0.17
seemed
0.16
0.16
å¿ĺ
0.16
Activations Density 0.044%