INDEX
Negative Logits
y
-0.84
t
-0.73
“
-0.73
y
-0.70
Robert
-0.64
al
-0.63
l
-0.63
Charles
-0.63
l
-0.62
’
-0.61
POSITIVE LOGITS
-->
1.52
]-->
1.47
-->
1.29
itſelf
1.19
myſelf
1.19
-->
1.17
للاسماء
1.16
*/}
1.15
raiſ
1.14
-->>
1.14
Activations Density 0.036%