INDEX
Explanations
absurdity and ridiculousness
New Auto-Interp
Negative Logits
torn
0.43
eek
0.40
tricky
0.39
型
0.39
څرنګ
0.38
Interesting
0.38
型
0.38
সমস্যায়
0.37
சிக்க
0.37
不好
0.37
POSITIVE LOGITS
absurd
1.09
ridiculous
1.06
absurdity
0.98
absur
0.89
stupid
0.89
ludicrous
0.88
silly
0.86
foolish
0.82
nonsense
0.82
insane
0.78
Activations Density 0.016%