INDEX
Negative Logits
y
-1.04
Y
-0.88
Y
-0.76
way
-0.70
yi
-0.57
vy
-0.56
WAY
-0.56
yat
-0.55
ya
-0.53
𝑦
-0.53
POSITIVE LOGITS
Efq
0.88
itſelf
0.85
myſelf
0.83
ſelf
0.80
whofe
0.75
WriteTagHelper
0.75
Shakspeare
0.74
ſelves
0.73
doubtnut
0.73
ſche
0.71
Activations Density 0.109%