INDEX
Explanations
references to deceased individuals
New Auto-Interp
Negative Logits
uset
-0.78
Brind
-0.75
ویکیپدیای
-0.74
bbons
-0.74
#+#
-0.73
&___
-0.73
riwal
-0.73
ثيق
-0.72
rostis
-0.70
toothbrush
-0.69
POSITIVE LOGITS
LATE
2.35
late
2.29
Late
2.27
Late
2.21
LATE
2.09
late
1.92
early
1.31
early
1.26
Early
1.25
Early
1.24
Activations Density 0.051%