INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Love
    -0.09
     love
    -0.08
     ผล
    -0.08
     Symbol
    -0.07
    love
    -0.07
     Fake
    -0.07
     товар
    -0.07
     characteristic
    -0.07
    true
    -0.06
    derive
    -0.06
    POSITIVE LOGITS
     Mont
    0.09
     Sierra
    0.07
     Glen
    0.07
     Europa
    0.07
    IRM
    0.07
    0.07
     checkpoint
    0.07
    0.07
     Murray
    0.07
    지노
    0.07
    Act Density 0.066%

    No Known Activations