INDEX
    Explanations

    phrases indicating relationships and comparisons

    New Auto-Interp
    Negative Logits
    rap
    -0.06
    rus
    -0.06
    .
    -0.06
    ilk
    -0.05
    jong
    -0.05
    Ế
    -0.05
    rada
    -0.05
    hurst
    -0.05
    377
    -0.05
    Utilities
    -0.05
    POSITIVE LOGITS
     sense
    0.19
     strict
    0.17
     Sense
    0.16
    sense
    0.16
    Sense
    0.15
     sentido
    0.15
     strictly
    0.14
     meaning
    0.14
     broad
    0.14
     literal
    0.14
    Act Density 0.032%

    No Known Activations