INDEX
    Explanations

    instances of the word "go" in various contexts

    New Auto-Interp
    Negative Logits
    lake
    -0.67
    picking
    -0.65
     ammon
    -0.64
    ament
    -0.64
    mith
    -0.63
    race
    -0.62
     Aram
    -0.61
    urgy
    -0.61
    role
    -0.59
    lain
    -0.59
    POSITIVE LOGITS
    vernment
    1.07
    verning
    1.04
    ven
    0.96
    ffic
    0.92
    ogly
    0.87
    lems
    0.84
    etz
    0.83
    zzi
    0.83
    zzo
    0.83
    ppe
    0.80
    Act Density 0.008%

    No Known Activations