INDEX
    Explanations

    firstly, secondly, thirdly

    New Auto-Interp
    Negative Logits
     or
    -1.00
     other
    -0.85
    男主
    -0.82
     was
    -0.81
     workstation
    -0.78
     بيها
    -0.77
     kite
    -0.75
    but
    -0.74
     affect
    -0.74
     time
    -0.71
    POSITIVE LOGITS
     сначала
    1.36
    まず
    1.13
    THING
    0.95
    まずは
    0.92
    íté
    0.85
    首先
    0.84
     Podemos
    0.84
    wechs
    0.84
    )});
    0.83
     firstly
    0.83
    Act Density 0.173%

    No Known Activations