INDEX
    Explanations

    reveals trends and findings

    New Auto-Interp
    Negative Logits
     melindungi
    0.87
     meskipun
    0.83
    unless
    0.82
    ится
    0.81
     nič
    0.81
     kuat
    0.79
    の為
    0.78
     不是
    0.78
    enschutz
    0.78
     bahkan
    0.77
    POSITIVE LOGITS
     reveals
    2.15
     reveal
    1.71
     revealing
    1.61
     Reveals
    1.55
     confirms
    1.51
    reveal
    1.46
     revealed
    1.41
     shows
    1.37
     révèle
    1.31
    reve
    1.30
    Act Density 0.023%

    No Known Activations