INDEX
Explanations
phrases indicating giving something to someone or providing information
commands or requests directed at others
New Auto-Interp
Negative Logits
éĹ
-0.74
lit
-0.71
istration
-0.67
*/(
-0.67
cffffcc
-0.66
phyl
-0.65
discharge
-0.64
pher
-0.62
ļé
-0.61
controlling
-0.61
POSITIVE LOGITS
Your
0.98
Yourself
0.91
Them
0.88
Now
0.84
Help
0.80
Take
0.80
Injury
0.78
Take
0.77
Leave
0.76
Give
0.76
Activations Density 0.111%