INDEX
    Explanations

    phrases related to changes or transformations

    phrases indicating significant changes or transformations

    New Auto-Interp
    Negative Logits
    crit
    -0.67
     Guides
    -0.65
    arers
    -0.63
    ullah
    -0.62
     crit
    -0.58
    sters
    -0.56
    liest
    -0.56
    fuck
    -0.55
    else
    -0.55
    earchers
    -0.55
    POSITIVE LOGITS
     sorts
    1.06
     course
    0.84
     theirs
    0.74
    course
    0.73
    rontal
    0.71
    ensibly
    0.71
    Course
    0.69
    emale
    0.68
    ricular
    0.66
    inence
    0.65
    Act Density 0.190%

    No Known Activations