INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
quir
-0.77
dab
-0.70
terday
-0.68
aided
-0.66
participates
-0.66
fully
-0.66
behalf
-0.63
Arabia
-0.63
malf
-0.63
airplanes
-0.61
POSITIVE LOGITS
anie
0.76
kees
0.75
Sims
0.72
ocial
0.69
img
0.67
asma
0.65
ondo
0.65
arte
0.64
Roose
0.63
uten
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.