INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    pring
    -0.77
    abeth
    -0.73
    icious
    -0.69
     ACTIONS
    -0.69
    oppable
    -0.69
    imated
    -0.65
    oice
    -0.64
    ICS
    -0.63
    orporated
    -0.63
    ulz
    -0.61
    POSITIVE LOGITS
     cave
    1.08
     caves
    1.01
     Dwell
    0.91
     paintings
    0.90
     entrances
    0.83
     canyon
    0.77
    yrinth
    0.75
    tto
    0.73
    lings
    0.72
    door
    0.72
    Act Density 0.031%

    No Known Activations