INDEX
Explanations
phrases describing different types or categories
New Auto-Interp
Negative Logits
romeda
-0.81
Rings
-0.81
umbai
-0.79
å§«
-0.78
pload
-0.76
olulu
-0.76
Bots
-0.75
destro
-0.75
nuts
-0.74
iae
-0.73
POSITIVE LOGITS
face
1.49
faces
1.44
etter
1.19
etting
1.15
casting
1.07
ahead
1.02
cast
0.89
classes
0.89
alias
0.84
typ
0.83
Activations Density 10.631%