INDEX
Explanations
phrases related to offers and updates
New Auto-Interp
Negative Logits
glac
-0.73
virginity
-0.72
manif
-0.71
exerc
-0.69
agall
-0.69
appropri
-0.67
wholes
-0.66
symp
-0.66
pellets
-0.64
manners
-0.64
POSITIVE LOGITS
-+-+-+-+
0.95
Previous
0.94
MORE
0.92
Correction
0.91
Advertisement
0.91
SHARES
0.89
Original
0.88
Update
0.85
Plot
0.84
Previous
0.83
Activations Density 0.064%