INDEX
    Explanations

    comparisons of quantity between different scenarios

    phrases that indicate cause and effect relationships

    New Auto-Interp
    Negative Logits
    anka
    -0.72
    osate
    -0.61
    laim
    -0.57
    usp
    -0.57
    alach
    -0.55
    Origin
    -0.55
    stros
    -0.54
    iasco
    -0.54
    bara
    -0.54
    ossus
    -0.54
    POSITIVE LOGITS
     fewer
    1.75
     less
    1.59
     clearer
    1.54
     nicer
    1.53
     quicker
    1.52
     sharper
    1.50
     richer
    1.49
     shorter
    1.47
     easier
    1.46
     smoother
    1.46
    Act Density 0.895%

    No Known Activations