INDEX
    Explanations

    mentions of television shows and their hosts

    New Auto-Interp
    Negative Logits
     ſind
    -0.92
     itſelf
    -0.84
    .",
    
    -0.83
     poffible
    -0.81
     iſt
    -0.81
     raiſ
    -0.81
    WebServlet
    -0.80
     faſt
    -0.80
    ſelves
    -0.79
     houſe
    -0.78
    POSITIVE LOGITS
    ....
    0.81
    ?
    0.81
    ...
    0.79
    .....
    0.74
     I
    0.73
    0.73
      
    0.70
    ??
    0.70
    ….
    0.66
    0.64
    Act Density 0.356%

    No Known Activations