INDEX
Explanations
phrases indicating something is untrue or inaccurate
terms and phrases related to falsehoods and inaccuracies
New Auto-Interp
Negative Logits
ofi
-0.79
ership
-0.79
incinn
-0.78
swing
-0.78
hov
-0.76
ingham
-0.76
adobe
-0.75
issue
-0.73
arya
-0.73
abiding
-0.73
POSITIVE LOGITS
inaccur
1.30
inaccurate
1.17
untrue
1.13
misinformation
1.04
inacc
0.99
misrepresent
0.99
ãĥĨ
0.91
misled
0.90
falsehood
0.88
misunderstand
0.88
Activations Density 0.015%