INDEX
Explanations
ages and years
age and years
New Auto-Interp
Negative Logits
ي
0.81
i
0.72
and
0.69
י
0.64
ای
0.63
л
0.58
u
0.55
ু
0.54
for
0.54
ت
0.53
POSITIVE LOGITS
trivia
0.46
вспо
0.45
reprim
0.44
decades
0.43
底层
0.42
Jahrze
0.42
itinerant
0.42
annih
0.42
adolesc
0.41
jaren
0.41
Activations Density 0.672%