INDEX
Explanations
phrases and discussions about debunking myths or misinformation
New Auto-Interp
Negative Logits
rawn
-0.16
acho
-0.16
estre
-0.15
acons
-0.14
Dou
-0.14
ảm
-0.14
lán
-0.14
ito
-0.14
á»ĵn
-0.13
unn
-0.13
POSITIVE LOGITS
ref
0.39
refute
0.34
debunk
0.31
dispro
0.31
bust
0.29
dispute
0.28
disp
0.28
challenge
0.28
reb
0.26
Disp
0.26
Activations Density 0.344%