INDEX
Explanations
instances of information being deemed incorrect or inaccurate
terms that denote falsehood or inaccuracies
New Auto-Interp
Negative Logits
agra
-0.99
abiding
-0.79
ramid
-0.78
trak
-0.77
neys
-0.76
mun
-0.75
hibit
-0.73
aea
-0.73
gans
-0.72
hov
-0.72
POSITIVE LOGITS
inaccurate
1.11
inaccur
1.10
incorrect
1.04
erroneous
0.99
misinformation
0.98
guiActiveUn
0.96
misconceptions
0.96
incorrectly
0.93
mistaken
0.90
misconception
0.90
Activations Density 0.028%