INDEX
Explanations
phrases indicating emphasis or significance
phrases that indicate discomfort or dissatisfaction in societal contexts
New Auto-Interp
Negative Logits
Windsor
-0.72
clusive
-0.71
Slash
-0.70
Meaning
-0.63
STOR
-0.63
Koen
-0.60
Heist
-0.59
Closing
-0.59
Pieces
-0.59
pandemonium
-0.58
POSITIVE LOGITS
were
0.96
have
0.95
reacted
0.93
had
0.88
should
0.85
are
0.85
were
0.84
cannot
0.84
who
0.83
behaved
0.82
Activations Density 0.278%