INDEX
Explanations
reported claims or situations
New Auto-Interp
Negative Logits
oppure
0.37
arba
0.36
Recuper
0.36
either
0.36
されている
0.35
또는
0.35
ஃப்
0.35
Occasionally
0.35
或者
0.34
Confusion
0.34
POSITIVE LOGITS
famously
0.55
(!)
0.54
якобы
0.51
supposedly
0.49
purportedly
0.48
(!)
0.47
allegedly
0.47
据说
0.45
(!
0.41
(!
0.41
Activations Density 0.048%