INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
zanne
-0.82
bern
-0.79
thriving
-0.76
Foss
-0.74
Tanz
-0.68
steen
-0.67
aepernick
-0.65
wikipedia
-0.65
radical
-0.65
ternity
-0.65
POSITIVE LOGITS
reads
0.80
ips
0.73
follows
0.70
oon
0.70
ously
0.70
icles
0.68
ence
0.68
ous
0.67
uin
0.65
''.
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.