INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     welcoming
    -0.08
    152
    -0.07
     solemn
    -0.07
     nightlife
    -0.07
     visited
    -0.07
     cheering
    -0.07
    entials
    -0.07
     urging
    -0.07
    icher
    -0.07
     imperative
    -0.07
    POSITIVE LOGITS
     клап
    0.10
     stopper
    0.10
     отверст
    0.09
     лист
    0.09
     tray
    0.09
    prung
    0.09
     лед
    0.09
     арг
    0.08
     характ
    0.08
     экран
    0.08
    Act Density 0.005%

    No Known Activations