INDEX
Explanations
irony and unexpected twists
concepts related to irony and contradictions
New Auto-Interp
Negative Logits
onding
-0.78
Doing
-0.74
Moving
-0.68
noticing
-0.67
ittal
-0.63
cking
-0.63
realizing
-0.63
deciding
-0.62
moving
-0.62
ivating
-0.61
POSITIVE LOGITS
resembles
1.05
resembled
0.98
constitutes
0.86
pires
0.86
dominates
0.83
translates
0.82
itiz
0.81
accompanies
0.80
coincides
0.79
includes
0.79
Activations Density 0.215%