INDEX
    Explanations

    instances indicating categories or classifications

    New Auto-Interp
    Negative Logits
     neceff
    -0.94
     Efq
    -0.93
     Anſ
    -0.92
     purpoſe
    -0.90
     ſtate
    -0.85
    tvguidetime
    -0.85
     houſe
    -0.84
     leſs
    -0.84
     pleaſure
    -0.84
     ſch
    -0.82
    POSITIVE LOGITS
     #
    0.52
    <eos>
    0.46
    OLVED
    0.45
     go
    0.45
     onResume
    0.45
    #
    0.44
     pare
    0.42
    urlopen
    0.41
     restore
    0.40
    0.40
    Act Density 0.019%

    No Known Activations