INDEX
Explanations
words related to urging or advising actions or behaviors
references to groups of people being encouraged or advised to take specific actions
New Auto-Interp
Negative Logits
atile
-0.67
mys
-0.64
ELD
-0.61
Built
-0.58
acebook
-0.54
certs
-0.53
Rated
-0.51
ILA
-0.50
ater
-0.50
anka
-0.50
POSITIVE LOGITS
beware
1.08
to
1.01
against
0.92
not
0.90
towards
0.89
toward
0.87
not
0.85
NOT
0.84
everywhere
0.82
accordingly
0.80
Activations Density 0.135%