INDEX
    Explanations

    phrases related to understanding or justification

    expressions of comprehensibility or rationality

    New Auto-Interp
    Negative Logits
    orp
    -0.65
    fast
    -0.63
    illac
    -0.62
    EE
    -0.62
    rock
    -0.60
    oo
    -0.59
     moon
    -0.59
     Brass
    -0.57
    °
    -0.57
    TION
    -0.56
    POSITIVE LOGITS
     understandable
    3.71
     manageable
    1.50
     understandably
    1.50
     admirable
    1.43
     incomprehensible
    1.36
     believable
    1.29
     predictable
    1.22
     palpable
    1.15
     inexplicable
    1.12
     readable
    1.11
    Act Density 0.024%

    No Known Activations