INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    ermo
    -0.07
     shores
    -0.06
     účet
    -0.06
     Boh
    -0.06
     leaves
    -0.06
     figura
    -0.06
    	↵	↵
    -0.06
    Las
    -0.06
    ialized
    -0.06
    POSITIVE LOGITS
    (ss
    0.07
    -chat
    0.07
    0.07
    /admin
    0.07
    ологіч
    0.07
     khuyến
    0.07
     Mehmet
    0.06
    -bind
    0.06
    0.06
    ุ้
    0.06
    Act Density 0.002%

    No Known Activations