INDEX
Explanations
phrases indicating surprising or unexpected revelations
phrases that indicate a consequential or revealing outcome
New Auto-Interp
Negative Logits
ropolitan
-0.74
atana
-0.72
Colleg
-0.69
è¦ļéĨĴ
-0.68
ordan
-0.66
riot
-0.65
riots
-0.65
ities
-0.64
lain
-0.63
icipated
-0.61
POSITIVE LOGITS
ĸ
0.76
Ī
0.74
WT
0.71
ij
0.71
terday
0.70
\\\\\\\\\\\\\\\\
0.69
¸
0.67
coat
0.66
Ĺ
0.66
beet
0.66
Activations Density 0.023%