INDEX
Explanations
questions starting with "Are we...?" or similar interrogative constructions
pronouns indicating personal perspective or involvement in discussion
New Auto-Interp
Negative Logits
isco
-0.78
aneers
-0.71
bies
-0.71
ameron
-0.70
icts
-0.69
enges
-0.68
links
-0.68
"],
-0.66
mares
-0.65
accompanies
-0.65
POSITIVE LOGITS
supposed
1.16
kidding
1.09
gonna
1.08
going
1.01
wasting
0.95
afraid
0.94
hiding
0.93
aware
0.92
glad
0.91
ready
0.90
Activations Density 0.090%