INDEX
    Explanations

    direct references to the word "you" in various contexts

    New Auto-Interp
    Negative Logits
     Uncomment
    -0.15
    rawer
    -0.15
    usher
    -0.15
    ogui
    -0.14
    yny
    -0.14
     вдÑĢÑĥг
    -0.14
    onom
    -0.14
    asser
    -0.14
     numberWith
    -0.14
    arp
    -0.14
    POSITIVE LOGITS
     forgot
    0.19
     said
    0.19
     seem
    0.19
     seems
    0.17
     mention
    0.17
     mentioned
    0.17
     mileage
    0.17
     seemed
    0.16
     
    0.16
    å¿ĺ
    0.16
    Act Density 0.044%

    No Known Activations