INDEX
Explanations
phrases related to negative events or controversial topics
instances of the word "the."
New Auto-Interp
Negative Logits
leeve
-0.72
IFA
-0.67
ItemLevel
-0.67
Slot
-0.67
iffe
-0.65
click
-0.65
cture
-0.65
oken
-0.63
MK
-0.62
*
-0.62
POSITIVE LOGITS
ensuing
1.35
resultant
1.24
resulting
1.19
accompanying
1.17
remainder
1.14
consequ
1.12
slightest
1.09
vast
1.06
latter
1.06
entire
1.04
Activations Density 0.307%