INDEX
Explanations
phrases indicating logical reasoning or argumentation
expressions related to making claims and denials
New Auto-Interp
Negative Logits
ortment
-0.84
pour
-0.69
itton
-0.67
naissance
-0.60
largeDownload
-0.59
newcom
-0.58
soon
-0.58
doubtless
-0.58
nown
-0.58
Kings
-0.58
POSITIVE LOGITS
anymore
1.64
slightest
1.10
nor
1.07
anywhere
1.07
any
1.04
anything
1.00
necessarily
0.98
whatsoever
0.91
anybody
0.90
enough
0.82
Activations Density 0.209%