INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.02
    ן
    0.81
    ir
    0.75
    lari
    0.71
    larda
    0.71
    sports
    0.69
    ures
    0.65
    irg
    0.64
    soluble
    0.63
    nets
    0.63
    POSITIVE LOGITS
    {
    0.87
     werden
    0.82
     be
    0.77
     país
    0.76
     are
    0.75
    3
    0.74
    0.73
     océ
    0.71
     três
    0.70
     nuestra
    0.69
    Act Density 0.001%

    No Known Activations