INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
armor
-0.17
probe
-0.15
ige
-0.15
Keto
-0.15
plate
-0.15
armor
-0.15
probes
-0.14
³
-0.14
armored
-0.14
harbor
-0.14
POSITIVE LOGITS
Iran
0.29
Iranian
0.28
Tehran
0.26
Iranians
0.25
iran
0.24
Iran
0.24
gays
0.22
gay
0.22
western
0.20
İran
0.20
Activations Density 0.000%
No Known Activations
This feature has no known activations.