INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ahime
-0.71
Reply
-0.71
minded
-0.70
riors
-0.66
oppos
-0.66
mma
-0.65
20439
-0.62
NAS
-0.61
nai
-0.61
actionGroup
-0.61
POSITIVE LOGITS
Dart
0.72
agon
0.70
Gaz
0.69
Fritz
0.69
Johannes
0.67
BuzzFeed
0.67
Fargo
0.66
Franken
0.65
tale
0.65
Bark
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.