INDEX
Explanations
repetitive phrases that indicate importance or significance
New Auto-Interp
Negative Logits
overall
-0.66
overall
-0.63
Overall
-0.57
zweier
-0.57
Overall
-0.57
kokona
-0.56
styleType
-0.56
SourceChecksum
-0.52
another
-0.52
الحره
-0.50
POSITIVE LOGITS
demás
0.67
stuff
0.64
paraphernalia
0.63
ingredients
0.57
aspects
0.56
things
0.54
SBATCH
0.52
remaining
0.52
facets
0.52
powy
0.51
Activations Density 0.383%