INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    pura
    -0.51
    -0.50
    гора
    -0.49
     
    -0.49
    ดี
    -0.48
    ]_
    -0.46
    -0.45
     Unix
    -0.44
     FZ
    -0.43
    Corn
    -0.43
    POSITIVE LOGITS
     <<
    4.30
    <<
    3.61
    )<<
    2.84
    ]<<
    2.59
    <<"
    2.38
    <<(
    2.37
     <<=
    2.30
    ()<<
    2.26
     <<"
    2.25
    <<<
    2.08
    Act Density 0.071%

    No Known Activations