INDEX
    Explanations

    words related to enjoyment, entertainment, and social interaction

    New Auto-Interp
    Negative Logits
    rael
    -0.66
     underest
    -0.65
    inx
    -0.64
    ©¶æ
    -0.63
     mater
    -0.61
    abases
    -0.61
     bottleneck
    -0.61
    opic
    -0.60
    heed
    -0.59
    fixed
    -0.59
    POSITIVE LOGITS
    nels
    1.00
    issance
    0.97
    sticks
    0.83
    oleon
    0.81
     enjoyment
    0.78
     Surprise
    0.78
     stroll
    0.78
    tainment
    0.77
    Fest
    0.77
    osity
    0.74
    Act Density 2.959%

    No Known Activations