INDEX
Explanations
phrases relating to speaking or acting on someone's behalf
phrases that indicate representation or advocacy for someone else
New Auto-Interp
Negative Logits
DIT
-0.73
HUD
-0.67
################################
-0.64
Fulton
-0.64
Article
-0.62
teen
-0.62
Extrem
-0.61
TT
-0.61
Traps
-0.60
ANG
-0.59
POSITIVE LOGITS
behalf
1.13
guiActiveUn
0.80
auga
0.76
maid
0.75
wcsstore
0.69
coerc
0.68
ouched
0.67
indal
0.67
oux
0.67
ende
0.66
Activations Density 0.005%