INDEX
Explanations
references to historical figures, names, or prominent individuals
New Auto-Interp
Negative Logits
ویکیپدیا
-0.58
rumahnya
-0.58
unhofer
-0.57
nakalista
-0.56
ngOn
-0.56
rungsseite
-0.55
насељу
-0.55
sélectionnés
-0.54
attentes
-0.54
yaszt
-0.54
POSITIVE LOGITS
used
1.15
เคย
0.92
used
0.92
Used
0.91
Used
0.91
operated
0.88
USED
0.88
once
0.86
pernah
0.83
từng
0.83
Activations Density 0.505%