INDEX
Explanations
references to myths or mythological content
references to myths and mythological concepts
New Auto-Interp
Negative Logits
ennis
-0.70
iew
-0.70
ensitive
-0.68
onduct
-0.67
ouch
-0.67
arry
-0.66
deliveries
-0.66
burg
-0.64
Assistant
-0.64
ACC
-0.64
POSITIVE LOGITS
myth
3.71
myths
2.96
Myth
2.73
Myth
2.68
mythology
2.42
legend
1.86
mythical
1.83
misconception
1.78
folklore
1.74
legends
1.51
Activations Density 0.023%