INDEX
Explanations
negations or limitations in the text
New Auto-Interp
Negative Logits
sharp
-0.19
leigh
-0.17
youngest
-0.16
sharp
-0.16
weakest
-0.15
Sharp
-0.14
smallest
-0.14
Sharp
-0.14
fairly
-0.14
845
-0.14
POSITIVE LOGITS
more
0.60
æĽ´å¤ļ
0.52
MORE
0.49
more
0.47
greater
0.47
lebih
0.46
болÑĮÑĪе
0.44
-more
0.42
daha
0.42
wiÄĻcej
0.42
Activations Density 0.003%