INDEX
Explanations
phrases that indicate monetary contributions or funding to organizations
New Auto-Interp
Negative Logits
istas
-0.69
fooled
-0.68
wiser
-0.67
Effects
-0.66
artifacts
-0.65
portray
-0.64
Cosponsors
-0.63
reddits
-0.63
moderators
-0.63
oday
-0.62
POSITIVE LOGITS
scription
0.75
course
0.73
sted
0.71
NRS
0.69
ESV
0.64
Apostle
0.64
arthed
0.63
Entry
0.62
TAMADRA
0.62
Lists
0.61
Activations Density 0.036%