INDEX
    Explanations

    words related to publishing and releasing information

    New Auto-Interp
    Negative Logits
    rail
    -0.87
    usa
    -0.81
    cone
    -0.75
    ft
    -0.70
    ombat
    -0.70
    ï¸
    -0.68
    restling
    -0.67
    usp
    -0.67
    uay
    -0.66
    avery
    -0.66
    POSITIVE LOGITS
     information
    1.13
     anything
    1.03
    ulate
    1.01
     transcripts
    0.99
     confidential
    0.96
     truthful
    0.96
     excerpts
    0.95
     details
    0.94
     inaccurate
    0.94
     updates
    0.94
    Act Density 0.185%

    No Known Activations