INDEX
Explanations
contractions with the word "not"
negations or contractions
New Auto-Interp
Negative Logits
CI
-0.68
rant
-0.64
senal
-0.64
è£ıè
-0.64
estate
-0.63
referen
-0.63
Person
-0.62
ItemImage
-0.62
successors
-0.61
å½
-0.61
POSITIVE LOGITS
afford
1.40
imagine
1.01
seem
0.95
rely
0.93
wait
0.91
really
0.90
ignore
0.89
handle
0.89
help
0.87
bluff
0.86
Activations Density 0.042%