INDEX
    Explanations

    nothing, Not, nowhere, nada, rien

    New Auto-Interp
    Negative Logits
    aro
    -0.10
    abol
    -0.09
    okers
    -0.08
    inker
    -0.08
     wsz
    -0.08
    isha
    -0.08
    ousse
    -0.08
    uti
    -0.08
    nage
    -0.08
    oker
    -0.08
    POSITIVE LOGITS
     nothing
    0.94
     Nothing
    0.77
    nothing
    0.76
     NOTHING
    0.72
    Nothing
    0.70
     nichts
    0.64
     nada
    0.62
     rien
    0.56
     ниÑĩего
    0.54
     nulla
    0.39
    Act Density 0.211%

    No Known Activations