INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     vapors
    0.47
     defense
    0.44
     gotten
    0.43
    ități
    0.43
     counselors
    0.42
     Theater
    0.42
     theater
    0.41
    `);
    0.41
     behaviors
    0.41
     Defense
    0.40
    POSITIVE LOGITS
     endeavoured
    0.62
    coloured
    0.59
     flavoured
    0.59
    centred
    0.59
     visualise
    0.59
     analysed
    0.57
     tyre
    0.57
     specialises
    0.56
     realisation
    0.56
     specialise
    0.55
    Act Density 0.002%

    No Known Activations