INDEX
Explanations
phrases related to controversial social or political topics
phrases about laws or regulations
New Auto-Interp
Negative Logits
ipeg
-0.68
itarian
-0.65
aval
-0.65
isen
-0.64
oll
-0.64
rend
-0.61
atre
-0.61
uber
-0.60
hatt
-0.59
ymph
-0.59
POSITIVE LOGITS
thereby
1.13
citing
1.07
including
0.99
preferring
0.97
allowing
0.94
noting
0.93
opting
0.93
excluding
0.93
implying
0.92
whereby
0.92
Activations Density 0.381%