INDEX
    Explanations

    verbs related to figuring things out or solving problems

    New Auto-Interp
    Negative Logits
    Down
    -1.47
    Out
    -1.47
    Up
    -1.45
     Out
    -1.42
     Up
    -1.39
    Off
    -1.39
     Down
    -1.36
     Off
    -1.35
    OUT
    -1.32
    Away
    -1.32
    POSITIVE LOGITS
    aarrggbb
    0.50
    achelor
    0.48
     put
    0.48
    ImageField
    0.48
    clown
    0.47
    bitField
    0.47
     done
    0.46
     ours
    0.46
     vack
    0.46
     tf
    0.45
    Act Density 0.246%

    No Known Activations