INDEX
    Explanations

    expressing likes and dislikes

    New Auto-Interp
    Negative Logits
     peut
    1.32
     deu
    1.25
     pesawat
    1.23
     pertenece
    1.16
    ğı
    1.16
     alcanza
    1.12
     intim
    1.12
     информации
    1.12
     resp
    1.12
    για
    1.12
    POSITIVE LOGITS
    y
    1.31
    ي
    1.30
     څنګه
    1.22
    redditmedia
    1.22
    minded
    1.19
    𝘳
    1.18
    मंडल
    1.16
    viel
    1.13
     چڑھ
    1.12
    ۥ
    1.11
    Act Density 0.424%

    No Known Activations