INDEX
Explanations
phrases that indicate absolutes or strong negations
New Auto-Interp
Negative Logits
auffi
-0.69
theils
-0.67
faſt
-0.64
itſelf
-0.62
Theſe
-0.61
prefi
-0.60
</caption>
-0.60
ainfi
-0.60
ſtill
-0.59
Pompeii
-0.59
POSITIVE LOGITS
keinerlei
0.92
InputDecoration
0.92
whatsoever
0.88
aucune
0.79
ویکیپدی
0.67
aucun
0.64
ZERO
0.64
毫不
0.63
никаких
0.62
никакого
0.61
Activations Density 0.327%