INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
arts
-0.75
reon
-0.71
inter
-0.68
auri
-0.67
parts
-0.67
evil
-0.67
ãĤ¬
-0.66
ios
-0.65
letters
-0.65
Rex
-0.63
POSITIVE LOGITS
fairly
0.73
relatively
0.72
moderately
0.71
sizeable
0.70
nom
0.68
versive
0.67
small
0.67
sizable
0.66
calibrated
0.65
slight
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.