INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
helm
-1.01
cius
-0.83
imore
-0.77
fred
-0.76
cmp
-0.73
daq
-0.72
arily
-0.71
edom
-0.71
ampunk
-0.71
arist
-0.71
POSITIVE LOGITS
44
0.74
25
0.74
26
0.73
56
0.70
58
0.69
2048
0.68
32
0.68
128
0.68
57
0.68
28
0.68
Activations Density 0.000%
No Known Activations
This feature has no known activations.