INDEX
Explanations
words related to contradiction or opposing viewpoints
New Auto-Interp
Negative Logits
iliz
-0.18
å¡ļ
-0.17
icide
-0.16
uba
-0.15
ãĥ³ãĤ¯
-0.15
istrat
-0.14
scribe
-0.14
ILA
-0.14
ificance
-0.14
im
-0.14
POSITIVE LOGITS
contr
0.30
CONTR
0.24
Contr
0.21
ictory
0.21
Contr
0.20
contr
0.19
ition
0.18
ived
0.17
actions
0.17
433
0.17
Activations Density 0.009%