INDEX
Explanations
phrases related to issues of disagreement or conflict
New Auto-Interp
Negative Logits
756
-0.16
allis
-0.14
775
-0.14
.SE
-0.14
CanBe
-0.13
785
-0.13
Hilton
-0.13
Hern
-0.13
IDEOGRAPH
-0.13
èŃ
-0.13
POSITIVE LOGITS
moda
0.15
ôm
0.15
izza
0.15
æ³¥
0.14
ibble
0.14
amet
0.14
ìĪĻ
0.14
destin
0.14
лÑĥÑĩ
0.14
mod
0.14
Activations Density 0.168%