INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
happiest
-0.74
idity
-0.73
erity
-0.67
rity
-0.67
aughs
-0.66
rities
-0.64
Kinnikuman
-0.63
Monroe
-0.63
metic
-0.63
strongest
-0.62
POSITIVE LOGITS
Blocks
0.85
dj
0.73
we
0.72
Chain
0.72
invoke
0.68
trip
0.68
Trip
0.67
rip
0.66
Lua
0.63
dan
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.