INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ~=
    -0.91
     tambah
    -0.83
    ейс
    -0.81
     inteligen
    -0.79
     sesi
    -0.78
     complications
    -0.76
    essages
    -0.75
    bouts
    -0.75
     teatr
    -0.75
    ść
    -0.74
    POSITIVE LOGITS
    dreaming
    1.00
    Hvorfor
    0.88
    e
    0.80
    sh
    0.79
    tona
    0.77
    Hudson
    0.74
     savings
    0.72
    namic
    0.72
     slopes
    0.71
    ponenten
    0.71
    Act Density 0.011%

    No Known Activations