INDEX
    Explanations

    action or state followed by context

    New Auto-Interp
    Negative Logits
     one
    -1.08
     necessary
    -1.07
     something
    -1.05
    七十
    -1.05
    でしたね
    -1.05
    んだよね
    -1.00
     that
    -0.99
    Only
    -0.99
    طع
    -0.98
    éton
    -0.96
    POSITIVE LOGITS
     the
    1.47
    1.19
    ytä
    1.07
    essä
    1.05
    1.03
    OTROS
    1.02
     ae
    1.01
    1.00
     wordt
    0.99
    bub
    0.96
    Act Density 0.073%

    No Known Activations