INDEX
Explanations
proper names or entities, particularly when accompanied by a possessive apostrophe 's
certain characters or symbols used in the text
New Auto-Interp
Negative Logits
democracy
-0.67
``
-0.62
Mobil
-0.60
dignity
-0.58
division
-0.58
uranium
-0.57
Democracy
-0.57
neut
-0.56
chart
-0.56
................
-0.55
POSITIVE LOGITS
s
1.48
t
1.06
sn
1.01
dq
1.01
shall
0.99
sat
0.97
d
0.97
sed
0.96
should
0.95
Pg
0.95
Activations Density 0.302%