INDEX
Explanations
phrases related to disbelief or rejection of claims
terms related to claims that are considered invalid, unsupported, or lacking credibility
New Auto-Interp
Negative Logits
udeau
-0.83
illes
-0.81
hov
-0.79
odon
-0.73
Franç
-0.71
anse
-0.71
oise
-0.71
eeper
-0.70
toc
-0.69
irie
-0.69
POSITIVE LOGITS
baseless
1.03
unfounded
1.01
False
0.91
allegations
0.90
accusations
0.88
Rum
0.85
ãĥ¥
0.84
false
0.83
allegation
0.82
theories
0.80
Activations Density 0.022%