INDEX
Explanations
phrases indicating a comparison or contrast between different perspectives or approaches
negations or phrases emphasizing what something is not
New Auto-Interp
Negative Logits
*:
-0.62
riber
-0.60
}}}
-0.60
lance
-0.60
stru
-0.60
kees
-0.58
WER
-0.58
FAQ
-0.58
stice
-0.57
[+
-0.55
POSITIVE LOGITS
necessarily
1.54
epad
1.16
withstanding
1.09
ifying
0.98
merely
0.95
icably
0.92
ifies
0.91
vice
0.88
unlike
0.86
just
0.85
Activations Density 0.059%