INDEX
Explanations
sentences containing negations, particularly focusing on negations with high confidence
negations and phrases indicating something is not true or does not exist
New Auto-Interp
Negative Logits
arta
-0.87
=-=-=-=-=-=-=-=-
-0.77
VIDEOS
-0.69
Dialogue
-0.68
newsletters
-0.68
anon
-0.66
mare
-0.66
ologies
-0.66
NOW
-0.63
certs
-0.63
POSITIVE LOGITS
conceived
1.04
unsuccessful
1.04
originally
1.02
instrumental
0.99
initially
0.98
born
0.93
successful
0.91
intended
0.87
able
0.86
supposed
0.83
Activations Density 0.283%