INDEX
Explanations
references to the middle class and associated concepts
New Auto-Interp
Negative Logits
:✨
-0.80
françaises
-0.77
Warhol
-0.73
enoord
-0.71
torchvision
-0.70
bewah
-0.69
HNO
-0.69
Harrell
-0.67
Nantucket
-0.65
tph
-0.65
POSITIVE LOGITS
Middel
1.45
Middle
1.43
MIDDLE
1.37
Middle
1.33
middle
1.29
Middles
1.20
MID
1.20
MIDDLE
1.20
middle
1.19
Mid
1.10
Activations Density 0.080%