INDEX
Explanations
terms related to contrasts between different options or perspectives
New Auto-Interp
Negative Logits
deen
-0.83
den
-0.67
inflamm
-0.67
ENTS
-0.66
Tickets
-0.66
aband
-0.65
uled
-0.65
Regist
-0.62
azeera
-0.62
ITED
-0.61
POSITIVE LOGITS
between
1.51
between
1.29
Between
1.20
BET
1.02
otomy
0.99
separating
0.92
Between
0.82
inherent
0.81
dilemma
0.81
favoring
0.79
Activations Density 0.143%