INDEX
Explanations
phrases related to inflation and exaggeration
terms related to inflated or exaggerated claims and conditions
New Auto-Interp
Negative Logits
abet
-0.98
atche
-0.80
sf
-0.74
clerosis
-0.71
abiding
-0.71
sen
-0.71
arij
-0.71
abetes
-0.70
ittee
-0.69
avis
-0.69
POSITIVE LOGITS
inflated
1.14
corrid
0.92
representations
0.84
caric
0.82
versions
0.81
exaggerated
0.78
expectations
0.76
explanations
0.74
ulated
0.73
proport
0.73
Activations Density 0.018%