INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    about
    -2.31
     about
    -2.25
    About
    -1.90
     ABOUT
    -1.70
     About
    -1.66
     tentang
    -1.65
     tungkol
    -1.45
    ABOUT
    -1.43
     abt
    -1.41
     acerca
    -1.40
    POSITIVE LOGITS
     the
    1.79
     a
    1.27
     an
    1.13
     this
    1.06
     it
    1.02
    1.02
     their
    0.99
     his
    0.98
     these
    0.97
     our
    0.97
    Act Density 0.118%

    No Known Activations