INDEX
Explanations
phrases indicating partial reasons or contributions
New Auto-Interp
Negative Logits
mostly
-0.23
mainly
-0.21
solely
-0.20
primarily
-0.19
sole
-0.19
inkel
-0.18
mostly
-0.17
sole
-0.17
either
-0.17
principalmente
-0.17
POSITIVE LOGITS
due
0.21
because
0.19
Due
0.19
due
0.19
responsible
0.18
çͱäºİ
0.17
Because
0.17
Because
0.16
because
0.16
Due
0.16
Activations Density 0.035%