INDEX
Explanations
opportunities and early warning
New Auto-Interp
Negative Logits
it
0.59
be
0.55
ال
0.49
يل
0.42
are
0.41
It
0.41
as
0.40
Ti
0.39
تم
0.38
0.38
POSITIVE LOGITS
n
0.52
ა
0.50
.}(
0.49
h
0.49
._
0.49
(
0.48
.(
0.47
9
0.46
.³
0.46
грамма
0.46
Activations Density 0.099%