INDEX
    Explanations

    compound nouns and titles

    New Auto-Interp
    Negative Logits
     debacle
    1.41
     goma
    1.34
    ис
    1.33
    ться
    1.32
     încep
    1.32
     заход
    1.31
    1.30
     στό
    1.30
    lassen
    1.28
     remaja
    1.27
    POSITIVE LOGITS
    ally
    1.21
    1.10
    izing
    1.07
     Jack
    1.04
    бие
    1.03
     Milan
    1.02
     Fraction
    1.00
    BOTH
    0.97
    legi
    0.96
     Turing
    0.95
    Act Density 0.001%

    No Known Activations