INDEX
    Explanations

    phrases that indicate policy suggestions and recommendations for improvement

    New Auto-Interp
    Negative Logits
    eka
    -0.15
    bay
    -0.15
    stroy
    -0.15
    istrovstvÃŃ
    -0.14
    ibal
    -0.14
    ritz
    -0.14
    atte
    -0.14
    à¸Ĺาà¸ĩà¸ģาร
    -0.14
    ucha
    -0.14
    coded
    -0.14
    POSITIVE LOGITS
     proposal
    0.29
     proposals
    0.26
     suggestions
    0.25
     ideas
    0.23
     suggestion
    0.23
    proposal
    0.22
     Proposal
    0.20
    ideas
    0.20
     Ideas
    0.20
     propose
    0.19
    Act Density 0.272%

    No Known Activations