INDEX
Explanations
words and phrases indicating functionality and normal operation
New Auto-Interp
Negative Logits
şık
-0.43
necesariamente
-0.41
Tra
-0.41
setz
-0.41
le
-0.40
tieg
-0.39
árol
-0.39
ela
-0.39
عش
-0.39
rettet
-0.38
POSITIVE LOGITS
snippetHide
0.99
undamaged
0.97
healthy
0.95
undisturbed
0.91
satisfactory
0.89
unharmed
0.86
flawless
0.85
正常
0.84
intact
0.84
satisfactorily
0.82
Activations Density 0.474%