INDEX
Explanations
phrases that suggest the existence of potential implications or outcomes
New Auto-Interp
Negative Logits
either
-0.19
indeed
-0.18
either
-0.18
both
-0.18
både
-0.18
first
-0.16
Either
-0.15
æĹ¢
-0.15
accordingly
-0.15
both
-0.15
POSITIVE LOGITS
other
0.26
also
0.24
equally
0.24
other
0.24
also
0.22
another
0.22
Also
0.20
otras
0.19
également
0.19
ALSO
0.19
Activations Density 0.518%