INDEX
Explanations
negations or statements of exclusion
New Auto-Interp
Negative Logits
Bronnen
-0.85
Sardinia
-0.85
Thoma
-0.82
TDA
-0.80
ricardo
-0.76
Tübingen
-0.75
raiſ
-0.74
uyer
-0.73
hammer
-0.73
diana
-0.73
POSITIVE LOGITS
not
1.27
Not
1.11
NOT
1.07
not
1.02
isNot
0.97
NOT
0.92
Not
0.92
niet
0.91
only
0.86
nicht
0.82
Activations Density 0.241%