INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    feste
    -0.89
    -0.88
     vende
    -0.86
     fantastique
    -0.85
     Verr
    -0.84
     pence
    -0.82
     immagin
    -0.81
    -0.79
    Meanwhile
    -0.79
    các
    -0.78
    POSITIVE LOGITS
     démocr
    0.79
     綠
    0.76
     quieran
    0.75
    𝙷
    0.75
    ธอ
    0.75
    intérêt
    0.74
    ėse
    0.74
    obrázek
    0.73
     arbejde
    0.73
     condiv
    0.73
    Act Density 0.148%

    No Known Activations