INDEX
Explanations
phrases related to communication and giving instructions
instances of direct speech or reported dialogue
New Auto-Interp
Negative Logits
aghd
-0.72
oshenko
-0.71
idates
-0.69
ses
-0.61
exploits
-0.61
controversies
-0.60
obos
-0.57
ãĥ³
-0.57
operates
-0.56
¥µ
-0.56
POSITIVE LOGITS
myself
1.96
my
1.50
mine
1.03
MY
0.88
yss
0.85
My
0.77
ourselves
0.75
oan
0.73
him
0.73
my
0.72
Activations Density 0.446%