INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
bound
-0.84
borg
-0.81
ZI
-0.78
talking
-0.76
sheets
-0.76
psych
-0.75
yon
-0.74
jug
-0.74
artifacts
-0.73
velt
-0.73
POSITIVE LOGITS
Ellis
0.79
Cooke
0.77
Childhood
0.73
¶
0.69
Kers
0.68
Seed
0.67
///
0.66
ab
0.66
Lee
0.66
Decl
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.