INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
antro
-0.15
onest
-0.15
νο
-0.15
лаб
-0.14
港
-0.14
psilon
-0.14
apers
-0.13
aurant
-0.13
habi
-0.13
VL
-0.13
POSITIVE LOGITS
Native
0.47
Native
0.42
native
0.38
Indigenous
0.35
indigenous
0.33
natives
0.30
native
0.30
.Native
0.30
/native
0.30
.native
0.29
Activations Density 0.000%
No Known Activations
This feature has no known activations.