INDEX
Explanations
affirmations and factual statements
New Auto-Interp
Negative Logits
fere
0.72
किसी
0.64
terk
0.63
(\
0.62
direito
0.61
NIH
0.60
uttam
0.60
എ
0.59
ആയി
0.59
Hoff
0.59
POSITIVE LOGITS
確かに
1.60
실제로
1.47
действительно
1.40
indeed
1.27
কথাটা
1.23
effectivement
1.21
Indeed
1.18
Indeed
1.16
doğrud
1.13
确实
1.12
Activations Density 0.095%