INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
bern
-0.92
İĭ
-0.83
romancer
-0.81
alian
-0.79
ividual
-0.74
rahim
-0.74
rises
-0.71
jobs
-0.70
site
-0.69
kefeller
-0.69
POSITIVE LOGITS
pac
0.81
wound
0.70
amput
0.66
arch
0.64
Fer
0.64
di
0.64
HI
0.63
hurd
0.63
Philipp
0.62
draft
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.