INDEX
Explanations
intense interaction with others
New Auto-Interp
Negative Logits
envers
0.49
Toward
0.45
Shared
0.43
menuju
0.43
Paul
0.42
Jane
0.42
⊷
0.41
terhadap
0.41
对
0.40
Lear
0.40
POSITIVE LOGITS
incessantly
0.49
harshly
0.47
verbally
0.44
coldly
0.44
relentlessly
0.42
merciless
0.41
fierc
0.41
sexually
0.40
excessively
0.39
socially
0.39
Activations Density 0.032%