INDEX
Explanations
words related to negation
repetitive phrases punctuated with commas or specific qualifiers
New Auto-Interp
Negative Logits
ounded
-0.67
gypt
-0.64
ende
-0.62
alone
-0.60
Equip
-0.59
oufl
-0.58
mud
-0.57
feasibility
-0.56
chains
-0.56
Klux
-0.56
POSITIVE LOGITS
sir
0.84
whatsoever
0.80
onsense
0.79
nor
0.73
thank
0.68
except
0.65
æĪ
0.61
Answer
0.61
Shift
0.60
exaggeration
0.60
Activations Density 0.053%