INDEX
Explanations
encouragement for further action
New Auto-Interp
Negative Logits
ing
1.19
ang
1.19
ि
1.09
ong
1.07
It
1.05
are
1.02
ter
1.00
ោក
0.98
า
0.97
h
0.96
POSITIVE LOGITS
も
1.55
,
1.43
ר
1.43
;
1.42
ও
1.39
р
1.35
4
1.30
י
1.30
5
1.29
6
1.29
Activations Density 0.176%