INDEX
Explanations
prepositional phrases describing inclusivity or exclusivity
phrases indicating exclusion or differentiation among groups or categories
New Auto-Interp
Negative Logits
ery
-0.81
dayName
-0.70
matic
-0.65
VERTISEMENT
-0.65
etting
-0.63
igure
-0.63
ike
-0.62
illary
-0.62
rig
-0.62
rogram
-0.60
POSITIVE LOGITS
ones
0.95
ours
0.82
those
0.78
âĶľ
0.76
humans
0.73
Including
0.71
those
0.69
except
0.68
yours
0.68
Ones
0.67
Activations Density 0.208%