INDEX
Explanations
reported statements or claims made by individuals, particularly focusing on assertions and declarations about events or issues
New Auto-Interp
Negative Logits
ì¸ł
-0.15
oret
-0.15
theless
-0.14
alam
-0.14
sip
-0.13
Kir
-0.13
iah
-0.13
ãĥĨãĥ«
-0.13
vá»ĭ
-0.13
abei
-0.13
POSITIVE LOGITS
:↵
0.17
:↵↵
0.16
:"↵
0.16
:č↵
0.15
simply
0.15
repeatedly
0.15
jac
0.15
danmark
0.14
olin
0.14
-mf
0.14
Activations Density 0.130%