INDEX
Explanations
instances where an action or expectation is contrasted with an unexpected outcome beginning with "Instead."
phrases indicating contrast or alternatives
New Auto-Interp
Negative Logits
uble
-0.74
identifiable
-0.66
describ
-0.66
categorized
-0.65
Likes
-0.65
inguishable
-0.64
classify
-0.64
Intellectual
-0.64
acea
-0.62
copyright
-0.61
POSITIVE LOGITS
Instead
1.15
Nope
1.10
alas
1.08
Wr
1.07
Instead
1.07
instead
1.00
instead
0.88
Turns
0.86
disappoint
0.74
transpired
0.73
Activations Density 0.391%