INDEX
Explanations
asking for recommendations or help
New Auto-Interp
Negative Logits
liked
0.81
men
0.71
MEN
0.69
owers
0.68
sl
0.67
asking
0.66
şey
0.64
obraz
0.64
әп
0.64
voglio
0.62
POSITIVE LOGITS
recommendations
0.90
hona
0.89
Đây
0.87
ideas
0.87
ihrem
0.87
insights
0.86
经验
0.86
insights
0.84
ампли
0.84
Ideas
0.83
Activations Density 0.028%