INDEX
Explanations
stopping or pausing actions
New Auto-Interp
Negative Logits
distinta
0.89
Different
0.89
വ്യത്യസ്ത
0.86
different
0.86
distinto
0.86
的不同
0.85
différente
0.84
अनेक
0.82
разные
0.82
diferentes
0.81
POSITIVE LOGITS
altogether
1.50
completely
1.30
unnecessary
1.26
offending
1.18
entirely
1.16
unwanted
1.15
掉
1.13
Completely
1.11
further
1.11
helt
1.07
Activations Density 0.169%