INDEX
    Explanations

    Names/People

    New Auto-Interp
    Negative Logits
     kanı
    -0.07
     laughed
    -0.06
    -Up
    -0.06
     considers
    -0.06
    MB
    -0.06
     derivative
    -0.06
     Leban
    -0.06
    -up
    -0.06
    rait
    -0.06
     recreate
    -0.06
    POSITIVE LOGITS
     A
    0.09
     W
    0.09
     F
    0.09
    .W
    0.08
    >G
    0.08
     C
    0.08
     D
    0.08
     G
    0.07
    =G
    0.07
     R
    0.07
    Act Density 0.080%

    No Known Activations