INDEX
Explanations
including or listing details
New Auto-Interp
Negative Logits
worded
0.55
hurtful
0.54
really
0.52
scary
0.51
horrible
0.50
decirlo
0.50
stressful
0.49
messed
0.49
Honestly
0.47
easier
0.47
POSITIVE LOGITS
including
0.72
включает
0.66
incluindo
0.63
частности
0.63
зокрема
0.62
featuring
0.61
Including
0.61
включая
0.61
including
0.60
,
0.59
Activations Density 0.000%