INDEX
    Explanations

    words related to trustworthiness or credibility

    instances of the substring "r"

    New Auto-Interp
    Negative Logits
    eers
    -0.83
     DRAG
    -0.68
    eer
    -0.68
     Loft
    -0.68
     Rising
    -0.67
    WAYS
    -0.67
     Winds
    -0.66
     staging
    -0.64
     warr
    -0.64
     dare
    -0.64
    POSITIVE LOGITS
    inct
    1.15
    ilateral
    1.01
    angle
    0.99
    usted
    0.98
    angles
    0.98
    acy
    0.98
    acist
    0.96
    uder
    0.96
    angled
    0.95
    ushed
    0.95
    Act Density 0.047%

    No Known Activations