INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    have
    -1.01
     have
    -1.00
    Have
    -0.92
     Have
    -0.90
     HAVE
    -0.82
    HAVE
    -0.75
     hebben
    -0.69
     κάν
    -0.69
     haben
    -0.68
     bave
    -0.67
    POSITIVE LOGITS
     been
    1.16
     their
    1.05
     a
    0.94
     remained
    0.83
     an
    0.79
     gotten
    0.76
     the
    0.74
     appeared
    0.74
     resulted
    0.74
     no
    0.71
    Act Density 0.115%

    No Known Activations