INDEX
Explanations
misleading or false information in text, including statements that are incorrect or fraudulent
New Auto-Interp
Negative Logits
iments
-0.80
ĸļ
-0.79
iment
-0.74
oleon
-0.69
ista
-0.63
isms
-0.60
iens
-0.59
arya
-0.58
achine
-0.57
illas
-0.57
POSITIVE LOGITS
named
0.68
inflated
0.67
priced
0.67
diagnosed
0.66
ãĤ©
0.66
unfocusedRange
0.65
label
0.64
accused
0.62
ball
0.62
accuse
0.62
Activations Density 7.534%