INDEX
Explanations
instances of confirmation or verification of statements or findings
New Auto-Interp
Negative Logits
Sk
-0.71
Rok
-0.68
Rok
-0.66
z
-0.63
Ling
-0.60
Sk
-0.60
േ
-0.57
duction
-0.57
jsx
-0.56
she
-0.56
POSITIVE LOGITS
Confirm
2.04
confirmations
2.02
confirmed
1.98
confirmation
1.95
confirm
1.95
Confirmed
1.93
confirms
1.92
Confirmation
1.89
confirming
1.83
confirmed
1.82
Activations Density 0.148%