INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Mehran
-0.68
IDENT
-0.67
IVERS
-0.67
CAST
-0.66
NER
-0.65
Reviewer
-0.63
PAR
-0.63
CHAR
-0.63
befriend
-0.62
Plot
-0.62
POSITIVE LOGITS
rums
0.70
theless
0.70
lain
0.70
oliath
0.69
arta
0.69
iku
0.68
bm
0.68
thur
0.67
ly
0.67
esty
0.66
Activations Density 0.000%
No Known Activations
This feature has no known activations.