INDEX
Explanations
a specific pattern in Hebrew characters
non-standard characters or symbols
New Auto-Interp
Negative Logits
mutants
-0.84
blacklist
-0.80
Borderlands
-0.79
factions
-0.78
Coul
-0.71
orbiting
-0.71
overlapping
-0.71
impuls
-0.71
demos
-0.70
unexpectedly
-0.69
POSITIVE LOGITS
à¤
2.68
à¥
2.59
ा
2.54
à¤
2.17
à¨
1.65
ر
1.58
à©
1.55
à¦
1.47
س
1.41
×Ļ×
1.40
Activations Density 0.009%