INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    angent
    -0.07
    řeb
    -0.07
    +v
    -0.06
    buat
    -0.06
    imate
    -0.06
    _RECEIVED
    -0.06
    _epsilon
    -0.06
    زيد
    -0.06
    +z
    -0.06
     dez
    -0.06
    POSITIVE LOGITS
     Hall
    0.24
     hall
    0.18
    Hall
    0.17
     halls
    0.14
    hall
    0.13
     hallway
    0.10
     Hart
    0.09
     hallmark
    0.09
     hood
    0.09
     auditor
    0.09
    Act Density 0.007%

    No Known Activations