INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Warfare
    -0.08
     warfare
    -0.08
     gwar
    -0.08
    qo
    -0.08
     dura
    -0.07
    wear
    -0.07
     nécessaire
    -0.07
     ένας
    -0.07
    ор
    -0.07
    JL
    -0.07
    POSITIVE LOGITS
     चाहता
    0.11
     muốn
    0.10
     want
    0.09
     willen
    0.09
     хотелось
    0.09
     хот
    0.08
    0.08
     deseas
    0.08
     sincerely
    0.08
     chcete
    0.08
    Act Density 0.038%

    No Known Activations