INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Also
    -0.98
    related
    -0.91
    do
    -0.88
    Související
    -0.87
     based
    -0.82
     likewise
    -0.81
    to
    -0.81
    berspace
    -0.81
    apply
    -0.81
     like
    -0.81
    POSITIVE LOGITS
     balkon
    0.98
     poème
    0.95
    şağı
    0.92
    хівовано
    0.89
     $\{
    0.89
    nelse
    0.88
    gegevens
    0.86
     głównie
    0.85
    medriver
    0.85
     冷
    0.84
    Act Density 0.002%

    No Known Activations