INDEX
Explanations
genuine and profound qualities
New Auto-Interp
Negative Logits
t
0.79
larda
0.75
l
0.72
n
0.70
ol
0.63
ag
0.62
r
0.62
rk
0.61
ேட்
0.61
m
0.60
POSITIVE LOGITS
真正的
0.91
truly
0.82
真正
0.72
verdadeira
0.67
genuine
0.64
Truly
0.63
verdadera
0.63
genuinely
0.62
véritable
0.59
true
0.57
Activations Density 0.031%