INDEX
Explanations
quantitative descriptors of prominence or superiority, such as "largest," "best," and "most."
New Auto-Interp
Negative Logits
تضيفلها
-0.67
transférez
-0.66
فريبيس
-0.62
vulgaires
-0.57
Colleagues
-0.57
effray
-0.56
становника
-0.56
дописавши
-0.55
WriteLiteral
-0.55
تانيه
-0.55
POSITIVE LOGITS
single
0.69
iest
0.61
urably
0.60
non
0.60
thing
0.58
asson
0.57
pure
0.57
holi
0.56
niest
0.54
private
0.54
Activations Density 0.203%