INDEX
Explanations
references to the concept of "more" or increased quantity
New Auto-Interp
Negative Logits
asse
-0.37
Exercise
-0.36
exercise
-0.36
schermata
-0.34
alert
-0.34
court
-0.33
وضع
-0.33
Span
-0.32
お問い合わせ
-0.32
轭
-0.31
POSITIVE LOGITS
More
1.12
MORE
0.95
more
0.86
Più
0.85
MORE
0.83
More
0.78
engkapnya
0.72
mehr
0.71
Mehr
0.71
more
0.69
Activations Density 0.003%