INDEX
    Explanations

    phrases indicating similarity or comparability

    phrases emphasizing comparison and similarity

    New Auto-Interp
    Negative Logits
    utherford
    -0.79
    orer
    -0.77
    orsi
    -0.70
    omore
    -0.67
    onite
    -0.65
    wat
    -0.65
    irens
    -0.64
    illard
    -0.63
    erella
    -0.63
    izons
    -0.62
    POSITIVE LOGITS
     manner
    1.71
     fashion
    1.42
     ways
    1.38
     way
    1.30
     vein
    1.19
     contexts
    1.11
     context
    1.08
     sense
    1.05
     terms
    1.05
     guise
    1.04
    Act Density 0.270%

    No Known Activations