INDEX
Negative Logits
iding
0.29
Protection
0.28
ن
0.28
=
0.28
ствует
0.28
↵
0.27
ం
0.27
as
0.26
م
0.26
的基础上
0.25
POSITIVE LOGITS
virtue
0.79
dint
0.58
means
0.53
zantine
0.52
mistake
0.44
Virtue
0.43
necessity
0.43
衷
0.41
means
0.38
virtues
0.36
Activations Density 0.074%