INDEX
Explanations
words or phrases related to communication actions, such as calling, emailing, and messaging
instances of the word "and."
New Auto-Interp
Negative Logits
mite
-0.71
utenberg
-0.69
LOCK
-0.63
adish
-0.63
aepernick
-0.63
gur
-0.61
disproportion
-0.61
hiber
-0.60
disproportionate
-0.60
attRot
-0.60
POSITIVE LOGITS
asked
1.16
inquired
1.15
asks
1.03
begged
1.02
congratulated
0.98
thanked
0.96
ask
0.95
apologized
0.94
gave
0.91
told
0.91
Activations Density 0.221%