INDEX
Explanations
clarity or emphasis in statements
phrases indicating clarity or making something clear
New Auto-Interp
Negative Logits
inse
-0.74
umbn
-0.68
umat
-0.66
asus
-0.65
ernels
-0.65
unte
-0.64
chance
-0.64
hovah
-0.63
Rou
-0.63
olen
-0.63
POSITIVE LOGITS
ances
0.91
distinctions
0.89
outlines
0.83
iary
0.81
distinction
0.79
forth
0.74
sailing
0.74
clear
0.71
contrasts
0.68
clearer
0.68
Activations Density 0.019%