INDEX
Explanations
phrases indicating specific groups or quantities
New Auto-Interp
Negative Logits
roc
-0.08
ovol
-0.07
bol
-0.06
елÑĮно
-0.06
ubl
-0.06
ÑĦоÑĢ
-0.06
ci
-0.06
sson
-0.06
arts
-0.06
sg
-0.06
POSITIVE LOGITS
hek
0.07
odo
0.07
graded
0.07
itus
0.06
Them
0.06
Them
0.06
maal
0.06
iasi
0.06
peak
0.06
æ´¥
0.06
Activations Density 0.022%