INDEX
Explanations
debunking false claims and ideas
New Auto-Interp
Negative Logits
明白了
0.42
جانتے
0.42
埬
0.41
यामुळे
0.41
Unexpected
0.40
desconocido
0.39
رموز
0.38
неожидан
0.38
Knowing
0.37
remembered
0.37
POSITIVE LOGITS
claim
1.36
claims
1.35
assertion
1.31
behaupt
1.29
assertions
1.27
claiming
1.20
notion
1.17
claims
1.15
Claim
1.13
Claims
1.09
Activations Density 0.041%