INDEX
Explanations
negative emotions and actions
New Auto-Interp
Negative Logits
öre
0.64
External
0.61
ette
0.59
Possible
0.59
apparently
0.59
Compliance
0.58
চলতি
0.58
apparently
0.58
ણે
0.57
Apparently
0.57
POSITIVE LOGITS
jealousy
0.97
bullies
0.96
heartache
0.90
stealing
0.89
dissection
0.84
heartbreak
0.84
witnessing
0.83
scammers
0.81
prank
0.81
admiration
0.80
Activations Density 0.000%