INDEX
Explanations
cowardly, cowardice, or coward
New Auto-Interp
Negative Logits
.
0.94
be
0.83
ور
0.80
ro
0.77
نا
0.77
res
0.75
ل
0.72
لر
0.71
س
0.71
ق
0.70
POSITIVE LOGITS
y
0.84
]:
0.69
findpost
0.68
ੁ
0.67
𝘢
0.66
o
0.65
IdleSync
0.65
como
0.64
ați
0.64
ை
0.64
Activations Density 0.001%