INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Copenhagen
-0.74
feature
-0.70
xes
-0.69
Europeans
-0.68
interstitial
-0.68
Danish
-0.65
Travels
-0.63
arget
-0.63
crop
-0.62
anwhile
-0.62
POSITIVE LOGITS
Jr
0.72
Guard
0.65
Contents
0.64
uncont
0.64
yon
0.63
uer
0.63
tarian
0.63
unia
0.60
gt
0.60
tears
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.