INDEX
    Explanations

    references to honor and recognition

    New Auto-Interp
    Negative Logits
     favor
    -0.80
     color
    -0.76
     labor
    -0.76
     ее
    -0.74
     honor
    -0.72
     center
    -0.69
     Ее
    -0.68
     favors
    -0.65
     "
    -0.65
    -0.64
    POSITIVE LOGITS
     neighbourhoods
    1.79
    colour
    1.78
    Colour
    1.75
     humour
    1.74
     colours
    1.74
    COLOUR
    1.74
     neighbourhood
    1.72
     honour
    1.71
     Honour
    1.69
     tumour
    1.69
    Act Density 0.110%

    No Known Activations