INDEX
Explanations
claimed statements or theories that have been proven false or incorrect
terms associated with debunking, disproving, or discrediting claims or ideas
New Auto-Interp
Negative Logits
ebus
-0.99
actionGroup
-0.80
NetMessage
-0.77
eyed
-0.74
arta
-0.65
aws
-0.65
hold
-0.65
alez
-0.65
son
-0.63
ouver
-0.62
POSITIVE LOGITS
debunked
1.15
debunk
1.10
discredited
0.94
dispro
0.94
idated
0.92
unfounded
0.87
ãĤ©
0.85
refute
0.84
proof
0.82
refuted
0.80
Activations Density 0.016%