INDEX
Explanations
references to typical characteristics or common occurrences
New Auto-Interp
Negative Logits
ализи
-0.14
andy
-0.14
latin
-0.14
uhan
-0.14
بت
-0.14
bart
-0.14
imbus
-0.14
uel
-0.14
flo
-0.13
eut
-0.13
POSITIVE LOGITS
usual
0.30
usual
0.25
typical
0.24
fare
0.21
suspects
0.21
typ
0.20
traditional
0.18
standard
0.18
-standard
0.17
Typ
0.17
Activations Density 0.194%