INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    IBILITIES
    -0.77
     esforço
    -0.77
     Wrestle
    -0.76
     indicó
    -0.76
     Vorstellungen
    -0.74
    щую
    -0.73
    Stret
    -0.73
    reciate
    -0.72
     něko
    -0.72
    (`
    -0.71
    POSITIVE LOGITS
     paced
    1.28
    paced
    0.96
     moving
    0.88
     uptake
    0.88
     fast
    0.87
     on
    0.84
     learners
    0.82
     thinking
    0.80
    moving
    0.77
     footed
    0.77
    Act Density 0.015%

    No Known Activations