INDEX
Explanations
numerical measurements or data related to weight and dimensions
New Auto-Interp
Negative Logits
ulas
-0.16
bane
-0.15
FUNC
-0.14
rane
-0.14
urus
-0.14
ilestone
-0.14
agini
-0.14
alam
-0.14
prostituer
-0.14
Fcn
-0.13
POSITIVE LOGITS
pro
0.33
je
0.25
punk
0.20
Âłje
0.19
Mess
0.19
mess
0.18
maximal
0.17
wert
0.17
Je
0.17
ante
0.17
Activations Density 0.020%