INDEX
Explanations
Anthropic, anthropology, anthros
New Auto-Interp
Negative Logits
ت
1.31
т
1.20
त
1.02
t
0.98
AA
0.93
د
0.93
ก
0.93
る
0.93
ES
0.90
ا
0.88
POSITIVE LOGITS
<0x80>
0.80
'
0.79
in
0.77
;
0.77
0.73
in
0.72
judiciary
0.68
불구하고
0.68
0.66
Philippine
0.66
Activations Density 0.031%