INDEX
    Explanations

    sentences that express strong positive sentiments or admiration towards subjects

    New Auto-Interp
    Negative Logits
    /Dk
    -0.16
    heim
    -0.14
     weren
    -0.14
     aren
    -0.14
    abant
    -0.14
    icum
    -0.14
    év
    -0.13
    ubbo
    -0.13
     fucked
    -0.13
    akeup
    -0.13
    POSITIVE LOGITS
     ROCK
    0.23
     truly
    0.22
     rocks
    0.22
     Rocks
    0.21
     sure
    0.21
     totally
    0.20
     rivals
    0.20
     rival
    0.19
     rules
    0.18
     Rock
    0.18
    Act Density 0.139%

    No Known Activations