INDEX
Explanations
claims and misconceptions about various topics, particularly health and societal issues
New Auto-Interp
Negative Logits
ết
-0.16
TODO
-0.14
igits
-0.14
ourmet
-0.14
åĵģ
-0.14
zzo
-0.13
ænd
-0.13
cef
-0.13
Classified
-0.13
orsche
-0.13
POSITIVE LOGITS
myths
0.42
myth
0.41
Myth
0.36
perception
0.33
perceptions
0.31
stereotypes
0.30
mythology
0.29
commonly
0.27
false
0.27
miscon
0.27
Activations Density 0.345%