INDEX
Explanations
positive descriptors and adjectives that convey approval or enhancement
New Auto-Interp
Negative Logits
ViewFeatures
-0.71
клопе
-0.70
شهاد
-0.68
itſelf
-0.66
ειτουργ
-0.66
NewUrlParser
-0.64
QMetaType
-0.62
חיצוניים
-0.62
kaynağından
-0.61
Monfieur
-0.60
POSITIVE LOGITS
<eos>
0.61
']").
0.59
']}
0.57
)
0.56
)}
0.52
pros
0.52
ⓧ
0.51
}}"
0.50
`),
0.49
*{\0.49
Activations Density 0.517%