INDEX
    Explanations

    Publishing house name

    New Auto-Interp
    Negative Logits
    .quote
    -0.07
    olo
    -0.07
     BS
    -0.06
    ipro
    -0.06
    89
    -0.06
    	q
    -0.06
    86
    -0.06
    ро
    -0.06
     bs
    -0.06
     RS
    -0.06
    POSITIVE LOGITS
    acerb
    0.06
     الش
    0.06
     втор
    0.06
     missiles
    0.06
     fils
    0.06
     relational
    0.06
     einf
    0.06
     penny
    0.06
    实验
    0.06
    ssp
    0.06
    Act Density 0.036%

    No Known Activations