INDEX
Explanations
bold actions or statements
New Auto-Interp
Negative Logits
OTOS
-0.90
Cheong
-0.87
enfranch
-0.65
duly
-0.64
ADS
-0.63
utra
-0.63
AW
-0.62
yip
-0.62
apolis
-0.62
nesota
-0.61
POSITIVE LOGITS
faced
1.20
er
1.09
ness
1.05
face
0.98
est
0.89
mouth
0.89
nesses
0.84
ly
0.82
bold
0.81
word
0.81
Activations Density 0.028%