INDEX
    Explanations

    weigh, transform, update, probability

    New Auto-Interp
    Negative Logits
     Señor
    0.46
     atrocities
    0.46
     intre
    0.45
     também
    0.44
     atividades
    0.44
     proximité
    0.43
     Monica
    0.42
     sembra
    0.42
     algumas
    0.42
     fără
    0.41
    POSITIVE LOGITS
    6
    0.63
    9
    0.62
    8
    0.62
    7
    0.62
    4
    0.59
     这个
    0.50
    5
    0.49
    `/
    0.48
    ("
    0.48
    vector
    0.47
    Act Density 0.145%

    No Known Activations