INDEX
    Explanations

    phrases indicating similarity or comparison

    phrases that denote comparisons and similarities

    New Auto-Interp
    Negative Logits
     squash
    -0.77
     bang
    -0.69
     Beng
    -0.64
     Dota
    -0.62
     Bund
    -0.61
    ELY
    -0.59
     demolition
    -0.59
     neighbourhood
    -0.58
     Rumble
    -0.58
     Derby
    -0.57
    POSITIVE LOGITS
    chart
    0.86
    accompan
    0.84
    æ©Ł
    0.82
    ctr
    0.78
    wise
    0.77
    quartered
    0.76
    wcs
    0.72
    forward
    0.70
    initions
    0.70
    nown
    0.70
    Act Density 0.026%

    No Known Activations