INDEX
    Explanations

    variable assignment and types

    New Auto-Interp
    Negative Logits
    rca
    -1.02
    lala
    -0.94
     menambahkan
    -0.93
    größ
    -0.93
    ikke
    -0.92
     akan
    -0.91
     cinq
    -0.91
    rafa
    -0.88
    ",""
    -0.88
     croire
    -0.87
    POSITIVE LOGITS
     from
    1.30
     via
    0.95
     emerges
    0.85
     through
    0.83
    receiver
    0.79
     atenção
    0.78
    anyi
    0.77
     Singular
    0.77
    andReturn
    0.77
     everytime
    0.75
    Act Density 0.025%

    No Known Activations