INDEX
Explanations
phrases indicating information or instructions given to others
instances of the phrase "we were told."
New Auto-Interp
Negative Logits
Labor
-0.74
adesh
-0.68
ouf
-0.68
hur
-0.65
cession
-0.65
aband
-0.64
ent
-0.63
aho
-0.63
avez
-0.61
ivot
-0.61
POSITIVE LOGITS
tale
0.80
llor
0.72
ariat
0.71
ÃĽ
0.68
repeatedly
0.68
perspect
0.67
ļé
0.67
proced
0.65
Īè
0.65
aback
0.65
Activations Density 0.027%