INDEX
    Explanations

    phrases indicating roles, templates, or guiding frameworks

    New Auto-Interp
    Negative Logits
     aspect
    -0.15
    pickle
    -0.14
    aku
    -0.14
     hallmark
    -0.14
    loff
    -0.14
    icol
    -0.13
     aspects
    -0.13
    alla
    -0.13
     pans
    -0.13
    itures
    -0.13
    POSITIVE LOGITS
     starting
    0.46
    starting
    0.39
     guide
    0.37
     Starting
    0.37
    Starting
    0.36
    guide
    0.33
     reference
    0.31
     jumping
    0.30
    reference
    0.29
     guides
    0.28
    Act Density 0.165%

    No Known Activations