INDEX
Explanations
instances of high numerical values
Category classification
New Auto-Interp
Negative Logits
ioutil
-0.57
saveiro
-0.57
MethodManager
-0.57
calendriers
-0.56
indígen
-0.54
ویکیپدی
-0.54
vlasy
-0.52
genodigd
-0.51
gehouden
-0.51
hilsen
-0.51
POSITIVE LOGITS
↵↵↵
0.78
↵↵↵↵
0.68
↵↵↵↵↵
0.64
Stahl
0.56
↵↵↵↵↵↵↵
0.55
↵↵↵↵↵↵↵↵
0.54
↵↵↵↵↵↵↵↵↵
0.53
↵↵↵↵↵↵
0.52
McCle
0.50
mstyle
0.48
Activations Density 0.018%