INDEX
Explanations
phrases related to policies, decisions, or positions
the presence of specific punctuation or formatting markers
New Auto-Interp
Negative Logits
ģĸ
-0.74
diplom
-0.71
adm
-0.69
differe
-0.69
é¾įå¥ij士
-0.68
æ©
-0.68
Aval
-0.66
Slov
-0.66
compe
-0.65
phyl
-0.65
POSITIVE LOGITS
redict
1.30
aired
1.28
ossession
1.27
ossible
1.24
ulse
1.18
ardon
1.17
uzzle
1.17
ierce
1.16
odcast
1.16
uls
1.16
Activations Density 0.034%