INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    abeth
    -0.78
    pring
    -0.78
    icious
    -0.78
    ICS
    -0.71
    imated
    -0.70
     ACTIONS
    -0.70
    orporated
    -0.69
     ILCS
    -0.68
    IFA
    -0.66
    LM
    -0.66
    POSITIVE LOGITS
     cave
    1.07
     Dwell
    0.93
     caves
    0.92
     paintings
    0.79
     canyon
    0.77
     hiber
    0.76
     collapses
    0.74
     dwellings
    0.73
     dwelling
    0.71
     walls
    0.70
    Act Density 0.011%

    No Known Activations