INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Carn
    -0.07
     cinematic
    -0.07
     cartoon
    -0.07
     wish
    -0.07
     cartoons
    -0.07
    -0.07
     mits
    -0.07
     Cultural
    -0.07
     sunny
    -0.07
    -0.07
    POSITIVE LOGITS
     внимание
    0.09
     aandacht
    0.08
     Valladolid
    0.08
    িটি
    0.08
    ITable
    0.08
    .reduce
    0.08
    ტი
    0.08
     вним
    0.08
    eam
    0.07
     atención
    0.07
    Act Density 0.035%

    No Known Activations