INDEX
Explanations
negation terms or phrases such as "nor" and "neither."
New Auto-Interp
Negative Logits
fwd
-0.68
pezi
-0.66
ittens
-0.66
ARAB
-0.65
SAX
-0.62
wiście
-0.60
bulk
-0.60
Brag
-0.59
Turn
-0.58
tı
-0.58
POSITIVE LOGITS
neither
1.24
Nor
1.20
neither
1.18
nor
1.17
nor
1.12
NOR
1.12
Neither
1.09
Neither
1.08
Nor
1.07
theless
1.05
Activations Density 0.068%