INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
LookAnd
-0.85
InjectAttribute
-0.79
beginnetje
-0.69
<=",
-0.67
Roskov
-0.66
Мексичка
-0.66
RegistryLite
-0.66
IsContent
-0.65
NKC
-0.64
مرئيه
-0.63
POSITIVE LOGITS
seamnă
0.50
toBeTruthy
0.47
Tema
0.46
streng
0.45
tower
0.44
νομα
0.44
سياسي
0.43
atimes
0.42
humanité
0.41
Tema
0.41
Activations Density 0.000%
No Known Activations
This feature has no known activations.