INDEX
    Explanations

    percentages and specific terms

    New Auto-Interp
    Negative Logits
     independência
    0.72
     confiança
    0.69
     ropa
    0.69
     semangat
    0.68
    rotnie
    0.68
     cerita
    0.67
     preferências
    0.67
    0.66
    тира
    0.66
    გილ
    0.65
    POSITIVE LOGITS
     &
    0.75
    OS
    0.67
    /
    0.64
    OC
    0.62
    IA
    0.62
    (
    0.61
    IER
    0.61
     C
    0.60
    II
    0.60
    1
    0.59
    Act Density 0.001%

    No Known Activations