INDEX
    Explanations

    statements confirming or asserting facts

    assertions of factual claims

    New Auto-Interp
    Negative Logits
     Flavoring
    -0.73
    bye
    -0.69
     Crown
    -0.68
     Sov
    -0.67
     Greenwood
    -0.64
    South
    -0.60
     Klux
    -0.59
     Corinthians
    -0.58
    incinn
    -0.58
     CK
    -0.57
    POSITIVE LOGITS
    ional
    0.97
    REP
    0.78
    uality
    0.73
    olkien
    0.72
    çī
    0.71
    netflix
    0.71
    uracy
    0.71
    opus
    0.70
    orial
    0.69
     ###
    0.69
    Act Density 0.016%

    No Known Activations