INDEX
    Explanations

    phrases and statements expressing gratitude or appreciation

    phrases emphasizing the ability to do something

    New Auto-Interp
    Negative Logits
     germ
    -0.69
     Tradition
    -0.66
     VG
    -0.65
     Eug
    -0.62
     Technique
    -0.62
     Yose
    -0.61
     Famous
    -0.61
     Kendall
    -0.59
     Deer
    -0.59
     Solitaire
    -0.59
    POSITIVE LOGITS
    bodied
    1.12
    ioned
    0.99
    awaru
    0.90
     access
    0.85
    umbered
    0.84
    't
    0.83
    reys
    0.81
    ittees
    0.79
    urred
    0.78
    ience
    0.76
    Act Density 0.027%

    No Known Activations