INDEX
    Explanations

    references to significant historical events and figures associated with social and political commentary

    New Auto-Interp
    Negative Logits
     enligt
    -0.51
     möjlighet
    -0.50
     natten
    -0.50
    <bos>
    -0.47
     ifølge
    -0.45
     ubicada
    -0.45
     educativos
    -0.45
    optarg
    -0.45
    笑道
    -0.44
     forbindelse
    -0.44
    POSITIVE LOGITS
     يتيمه
    0.72
    رشف
    0.68
     really
    0.66
    RegressionTest
    0.65
     goddamn
    0.64
     parado
    0.64
     darn
    0.63
    aaaaaaaa
    0.63
    ğraf
    0.62
    really
    0.62
    Act Density 0.682%

    No Known Activations