INDEX
Explanations
equality, Monologue, Scenario, response rewritten
New Auto-Interp
Negative Logits
Ironically
0.63
utmost
0.62
Essentially
0.61
tradiz
0.60
Magic
0.60
Granted
0.58
собственно
0.57
उत्तर
0.56
ುದು
0.56
famously
0.56
POSITIVE LOGITS
ترین
0.77
점에서
0.75
پرداخت
0.68
umento
0.67
Than
0.67
অর্থাৎ
0.66
परन्तु
0.66
nhưng
0.65
ሆኑ
0.65
版本
0.65
Activations Density 1.132%