INDEX
Explanations
abbreviations or special terms related to classifications or categories
New Auto-Interp
Negative Logits
('/:-0.90
("/:-0.68
Ronnie
-0.67
Dunlap
-0.67
Dietz
-0.66
“
-0.65
resina
-0.65
y
-0.62
><!--
-0.61
ade
-0.61
POSITIVE LOGITS
xs
1.59
XS
1.41
xs
1.36
XS
1.05
myſelf
1.02
styleType
1.01
Majefty
0.99
Monfieur
0.96
Chriftian
0.94
themſelves
0.90
Activations Density 0.001%