INDEX
Explanations
statements or facts regarded as accurate
statements related to accuracy and precision
New Auto-Interp
Negative Logits
loo
-0.80
acid
-0.79
AIN
-0.75
ovan
-0.73
doms
-0.71
Aid
-0.71
inki
-0.71
ATA
-0.70
Parties
-0.70
joining
-0.69
POSITIVE LOGITS
uracy
1.16
portrayal
1.00
inacc
0.97
inaccurate
0.92
accuracy
0.92
depiction
0.89
appraisal
0.87
imates
0.84
inaccur
0.83
inately
0.81
Activations Density 0.019%