INDEX
Explanations
unnecessary or redundant content
New Auto-Interp
Negative Logits
䉼
0.75
thả
0.75
䣼
0.73
猛
0.72
leck
0.72
餬
0.70
TMN
0.69
صی
0.69
kken
0.68
asso
0.68
POSITIVE LOGITS
unnecessary
3.56
superfluous
3.24
redundant
2.77
inutile
2.77
irrelevant
2.70
pointless
2.51
useless
2.50
unimportant
2.46
unnecessarily
2.39
अनावश्यक
2.35
Activations Density 0.474%