INDEX
Explanations
correctness, inaccuracy, and untruth
New Auto-Interp
Negative Logits
autogui
0.44
carbam
0.41
ếc
0.41
MOS
0.40
раствор
0.39
सॉल्यूशन
0.39
ಸಮಸ್ಯ
0.38
tet
0.38
imaginable
0.38
હાથ
0.36
POSITIVE LOGITS
inaccurate
1.66
accurate
1.66
untrue
1.57
inaccuracy
1.55
accurate
1.49
Accurate
1.48
inaccuracies
1.47
accuracy
1.41
准确
1.30
Accuracy
1.29
Activations Density 0.027%