INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     InputDecoration
    -0.57
    bottomRight
    -0.43
     BorderSide
    -0.42
    razer
    -0.42
     duide
    -0.42
     Marzo
    -0.41
    Proses
    -0.40
     Massa
    -0.40
    Necesito
    -0.40
    ]");
    -0.39
    POSITIVE LOGITS
     hotel
    1.09
     Hotel
    1.05
    Hotel
    1.04
     hotels
    1.02
    Hotels
    0.99
     Hotels
    0.99
    hotel
    0.98
    hotels
    0.97
     HOTEL
    0.94
    <bos>
    0.89
    Act Density 0.098%

    No Known Activations