INDEX
Explanations
themes related to social dynamics and personal relationships
New Auto-Interp
Negative Logits
ually
-0.20
atically
-0.20
ziej
-0.19
atively
-0.18
ymous
-0.18
ergic
-0.18
GED
-0.17
bable
-0.17
ially
-0.16
ingly
-0.16
POSITIVE LOGITS
ness
1.16
ity
0.85
NESS
0.80
eness
0.56
iness
0.53
ITY
0.50
itude
0.50
heid
0.49
ality
0.47
ism
0.47
Activations Density 0.192%