INDEX
Explanations
contrasting statements or phrases
New Auto-Interp
Negative Logits
ummings
-0.17
idis
-0.15
ifer
-0.15
wi
-0.15
elize
-0.14
arus
-0.14
wie
-0.14
jur
-0.14
peers
-0.14
ledge
-0.13
POSITIVE LOGITS
contrary
0.30
opposite
0.21
pill
0.20
Pill
0.19
zac
0.16
ä¿¡
0.15
azi
0.15
exact
0.15
contrario
0.15
OP
0.15
Activations Density 0.020%