INDEX
Explanations
important to note, understand, remember
New Auto-Interp
Negative Logits
enfrent
0.83
feelings
0.80
micrófono
0.79
ieder
0.75
secrets
0.75
respostas
0.74
trắng
0.73
connues
0.73
raccont
0.73
sentimientos
0.72
POSITIVE LOGITS
Correct
0.81
Incorrect
0.78
Incorrect
0.78
Confusion
0.77
Confusion
0.76
subset
0.76
Correction
0.75
incorrectly
0.74
Wrong
0.72
Correct
0.71
Activations Density 0.118%