INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
NES
-0.67
helle
-0.67
pas
-0.66
wic
-0.66
boats
-0.64
ICS
-0.63
Williams
-0.62
Bs
-0.62
eson
-0.61
oby
-0.61
POSITIVE LOGITS
icial
0.74
ours
0.70
lore
0.68
ingen
0.68
sorts
0.68
olon
0.67
theirs
0.65
orsi
0.62
Representatives
0.61
Tag
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.