INDEX
Explanations
emotionally charged or impactful phrases or statements, possibly related to personal experiences or strong reactions
conversational phrases that express opinions or emotions
New Auto-Interp
Negative Logits
heit
-0.79
abouts
-0.77
cca
-0.68
ady
-0.67
tyard
-0.66
osponsors
-0.66
erning
-0.66
ocobo
-0.65
woo
-0.65
officially
-0.64
POSITIVE LOGITS
Changes
0.87
Prof
0.83
SPONSORED
0.83
Certain
0.83
Instead
0.82
Region
0.81
Secondly
0.80
However
0.78
Therefore
0.78
)"
0.77
Activations Density 0.172%