INDEX
Explanations
opinions or statements made by people
phrases that indicate people's opinions or statements
New Auto-Interp
Negative Logits
nton
-0.67
resumed
-0.65
ubric
-0.64
ailability
-0.64
Written
-0.63
arious
-0.63
andestine
-0.62
acted
-0.61
syn
-0.60
ratified
-0.60
POSITIVE LOGITS
excuses
0.77
incorrectly
0.73
mares
0.72
assetsadobe
0.69
arrogance
0.67
ignorance
0.66
misconceptions
0.66
underest
0.66
cms
0.65
myths
0.65
Activations Density 0.334%