INDEX
Explanations
`ls`, `str`, `discriminator`, `statement`
New Auto-Interp
Negative Logits
س
0.48
Fre
0.43
0.41
St
0.40
Mün
0.39
Sol
0.38
Thoughts
0.38
Col
0.38
اد
0.38
Amherst
0.38
POSITIVE LOGITS
информация
0.50
ъ
0.49
UNK
0.47
មើ
0.46
attentively
0.45
ِيم
0.44
фигу
0.44
’:
0.44
ਿਆਂ
0.43
你有
0.43
Activations Density 0.006%