INDEX
Explanations
phrases indicating certainty or near-certainty
phrases that convey frequency or prevalence
New Auto-Interp
Negative Logits
Ö
-0.74
ocratic
-0.69
ר
-0.68
oted
-0.68
brim
-0.68
è¦ļéĨĴ
-0.67
à¤
-0.66
eding
-0.66
messenger
-0.65
æŃ¦
-0.65
POSITIVE LOGITS
ths
0.83
ogether
0.83
Berry
0.78
terson
0.76
urion
0.73
Enough
0.72
Says
0.72
irteen
0.71
irds
0.70
Benefits
0.70
Activations Density 0.009%