INDEX
Explanations
statements or questions that require verification or confirmation
New Auto-Interp
Negative Logits
Rok
-0.73
z
-0.71
Rok
-0.71
Sk
-0.70
tas
-0.63
Tas
-0.62
Pan
-0.62
lingua
-0.61
Sk
-0.60
yat
-0.60
POSITIVE LOGITS
confirmations
1.53
Confirm
1.48
Confirmation
1.39
confirms
1.33
Confirmed
1.33
confirmation
1.30
confirm
1.29
CONFIRM
1.29
confirmed
1.29
Affirm
1.27
Activations Density 0.127%