INDEX
Explanations
describes states or qualities
New Auto-Interp
Negative Logits
КА
0.88
০০
0.73
Faça
0.73
ها
0.72
Ди
0.71
ম্ভীর
0.71
Dengan
0.71
RAFT
0.70
ხვევ
0.70
БА
0.70
POSITIVE LOGITS
–
0.71
,
0.69
(
0.65
ik
0.65
ic
0.64
יים
0.63
deciding
0.63
warranted
0.62
;
0.61
necesit
0.60
Activations Density 0.320%