INDEX
    Explanations

    adjectives followed by 'all'

    New Auto-Interp
    Negative Logits
    pu
    -0.66
    lf
    -0.61
    uay
    -0.61
     virginity
    -0.58
    avorite
    -0.57
     proverb
    -0.57
    cel
    -0.57
     Ferdinand
    -0.57
    geries
    -0.56
    algia
    -0.56
    POSITIVE LOGITS
     expense
    0.88
     levels
    0.88
     angles
    0.84
    ocating
    0.83
     times
    0.82
    å¸
    0.78
    onge
    0.76
     seams
    0.74
     wavelengths
    0.74
     hazards
    0.74
    Act Density 0.024%

    No Known Activations