INDEX
Explanations
phrases indicating a type or category
New Auto-Interp
Negative Logits
volontà
-0.60
colère
-0.59
suficientes
-0.57
ragioni
-0.56
piú
-0.56
many
-0.56
atât
-0.54
several
-0.54
medesimo
-0.54
essenziale
-0.54
POSITIVE LOGITS
"):
0.96
ratulations
0.90
^(@)
0.90
thing
0.88
uesday
0.85
)':
0.85
}}]{0.84
relationship
0.84
)":
0.84
olesale
0.84
Activations Density 0.209%