INDEX
Explanations
terms related to positive experiences and beneficial effects
favorable effects on outcomes
New Auto-Interp
Negative Logits
defaultstate
-0.38
chien
-0.36
idleness
-0.32
Thacker
-0.31
kurat
-0.31
privées
-0.31
שוליים
-0.30
Artifact
-0.30
Rug
-0.30
başına
-0.30
POSITIVE LOGITS
favorably
0.68
favourably
0.66
positive
0.65
Positive
0.65
Positive
0.65
positive
0.65
positif
0.64
POSITIVE
0.62
favourable
0.60
favorable
0.60
Activations Density 0.194%