INDEX
Explanations
exclamations and interjections
New Auto-Interp
Negative Logits
il
0.65
ik
0.57
i
0.53
ఒ
0.53
كتب
0.52
GJ
0.52
क
0.50
im
0.50
Để
0.50
ంబేద్కర్
0.50
POSITIVE LOGITS
comedy
0.47
kidding
0.46
sailor
0.46
admiring
0.46
horrors
0.45
ktorá
0.44
comical
0.44
famously
0.44
是
0.44
wonder
0.44
Activations Density 0.865%