INDEX
    Explanations

    conversations about identity and personal experiences

    Japanese sentence fragments

    New Auto-Interp
    Negative Logits
     itſelf
    -0.98
    ſelf
    -0.94
     myſelf
    -0.89
     auffi
    -0.82
     sahiptir
    -0.82
     iſt
    -0.81
     ―――――
    -0.81
     ―――
    -0.80
     pleaſure
    -0.79
    ſelves
    -0.79
    POSITIVE LOGITS
     really
    1.03
     pretty
    0.99
     guys
    0.89
     nice
    0.89
     weird
    0.86
     shit
    0.85
     fucking
    0.84
     REALLY
    0.82
     like
    0.82
     Really
    0.81
    Act Density 0.197%

    No Known Activations