INDEX
Explanations
allegedly, reportedly, supposedly
New Auto-Interp
Negative Logits
Usually
0.43
annel
0.43
probabilities
0.38
Refer
0.38
Generally
0.38
абсолютно
0.38
Ill
0.37
Prob
0.37
Signed
0.37
一批
0.37
POSITIVE LOGITS
allegedly
1.83
reportedly
1.63
supposedly
1.56
якобы
1.54
purportedly
1.48
according
1.42
menurut
1.40
据
1.36
volgens
1.35
据说
1.32
Activations Density 0.022%