INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Thompson
-0.74
ãģ®
-0.72
âĢ¢âĢ¢
-0.71
Thom
-0.71
onto
-0.69
ãģį
-0.69
rued
-0.69
xit
-0.68
vt
-0.68
ety
-0.67
POSITIVE LOGITS
egalitarian
0.74
herself
0.68
abase
0.65
unsus
0.64
resil
0.60
twins
0.59
admire
0.59
Firm
0.59
estim
0.58
igans
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.