INDEX
Explanations
words related to contradiction or opposition
references to contradictions and oppositional concepts
New Auto-Interp
Negative Logits
istics
-0.72
Mehran
-0.71
NetMessage
-0.70
Tycoon
-0.68
Assass
-0.66
FSA
-0.65
Nanto
-0.65
throats
-0.62
istically
-0.62
eele
-0.61
POSITIVE LOGITS
contra
1.13
ption
1.08
ptions
1.08
ventions
0.88
vention
0.85
ven
0.85
asca
0.82
ctr
0.74
offensive
0.74
xon
0.74
Activations Density 0.010%