INDEX
Explanations
compliments or positive remarks
expressions of enjoyment or positive feedback
New Auto-Interp
Negative Logits
otomy
-0.76
aults
-0.75
ouston
-0.75
ossession
-0.74
moil
-0.72
istence
-0.71
odan
-0.69
otom
-0.68
contingency
-0.67
incial
-0.67
POSITIVE LOGITS
congr
1.10
Especially
0.89
Reviewer
0.85
Awesome
0.83
thank
0.81
congratulations
0.79
Beautiful
0.78
huh
0.77
especially
0.76
albeit
0.76
Activations Density 0.494%