INDEX
Explanations
phrases related to contradiction or violation
terms related to "contradiction" and "contraventions."
New Auto-Interp
Negative Logits
eele
-0.76
istics
-0.71
Mehran
-0.69
doms
-0.68
GOODMAN
-0.66
Nanto
-0.66
Assass
-0.65
FSA
-0.64
LCS
-0.63
throats
-0.63
POSITIVE LOGITS
ptions
1.23
ption
1.20
contra
0.99
ven
0.95
ventions
0.94
asca
0.94
vention
0.88
coni
0.76
ctr
0.76
vers
0.73
Activations Density 0.025%