INDEX
Explanations
negation or objection phrases
New Auto-Interp
Negative Logits
engertian
-0.55
forName
-0.53
zirc
-0.53
alnız
-0.53
marten
-0.53
ORIAL
-0.53
torchvision
-0.52
GLASS
-0.52
orten
-0.52
regioni
-0.51
POSITIVE LOGITS
etc
0.92
<=",
0.74
etc
0.73
kháu
0.72
للمعارف
0.70
Италијани
0.67
"..\..\
0.66
kasarigan
0.64
Administrativna
0.61
何より
0.60
Activations Density 0.237%