INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
roman
-0.83
ourgeois
-0.79
senal
-0.76
defensively
-0.67
rife
-0.66
abound
-0.65
arc
-0.65
antage
-0.64
=================
-0.63
fusc
-0.63
POSITIVE LOGITS
atri
0.81
Truman
0.66
Tel
0.66
Siri
0.62
IRO
0.62
husbands
0.62
sang
0.61
arta
0.61
IPP
0.60
Independence
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.