INDEX
Explanations
phrases that express contradiction or opposition
phrases that indicate skepticism or contradiction
New Auto-Interp
Negative Logits
entin
-0.76
aturated
-0.65
pez
-0.60
prus
-0.60
uzz
-0.59
active
-0.58
ournals
-0.58
esta
-0.58
issance
-0.57
waukee
-0.57
POSITIVE LOGITS
standpoint
1.05
perspective
0.89
contrary
0.88
point
0.78
anyway
0.77
least
0.77
viewpoint
0.75
extent
0.74
approximation
0.71
anyways
0.70
Activations Density 0.836%