INDEX
    Explanations

    threats and secrets

    New Auto-Interp
    Negative Logits
     straight
    -0.07
    aviti
    -0.07
    journ
    -0.07
     anis
    -0.07
    -0.06
    -0.06
     KS
    -0.06
    136
    -0.06
     ziehen
    -0.06
    avit
    -0.06
    POSITIVE LOGITS
     dared
    0.12
     outspoken
    0.10
     unpopular
    0.09
     disple
    0.09
    0.08
     threatens
    0.08
     слишком
    0.08
    (range
    0.08
    VH
    0.08
     sr
    0.08
    Act Density 0.143%

    No Known Activations