INDEX
    Explanations

    instances of engagement or interaction between entities

    New Auto-Interp
    Negative Logits
       
    -0.20
    former
    -0.17
    venir
    -0.16
     Byl
    -0.16
    Ñľ
    -0.16
    ongs
    -0.15
    ging
    -0.14
    ertino
    -0.14
    ò
    -0.14
    wik
    -0.14
    POSITIVE LOGITS
    ively
    0.24
    iveness
    0.24
    ives
    0.24
    ivate
    0.24
    al
    0.20
    å¼ı
    0.19
    uator
    0.19
    ative
    0.19
    ual
    0.19
    uality
    0.18
    Act Density 0.016%

    No Known Activations