INDEX
Explanations
references to actions or situations that are considered gratifying
terms related to gratification and its variations
New Auto-Interp
Negative Logits
aan
-0.79
ez
-0.78
oÄŁ
-0.77
anos
-0.75
ey
-0.75
icut
-0.74
away
-0.73
ese
-0.72
rio
-0.72
icted
-0.71
POSITIVE LOGITS
tesy
0.91
uitous
0.88
TRY
0.86
ULTS
0.84
ãĥ£
0.83
CONT
0.76
sidx
0.75
++++
0.74
icing
0.72
guiActiveUn
0.72
Activations Density 0.081%