INDEX
Explanations
phrases that express certainty or emphasis
phrases indicating certainty or frequency
New Auto-Interp
Negative Logits
åĤ
-0.71
Zion
-0.67
umbn
-0.67
tnc
-0.66
ãĤ¶
-0.66
ãģķ
-0.65
ensis
-0.65
ãģ®éŃĶ
-0.64
aciously
-0.64
idated
-0.64
POSITIVE LOGITS
importantly
0.76
entimes
0.75
kidding
0.71
referen
0.70
withstanding
0.69
Sounds
0.68
humans
0.68
Speaking
0.66
unsurprisingly
0.66
Negative
0.65
Activations Density 0.102%