INDEX
Explanations
phrases that express meaning or intention
New Auto-Interp
Negative Logits
"]);
-0.70
")));
-0.69
]));
-0.68
"];
-0.68
')));
-0.68
])):
-0.67
riwal
-0.67
%";
-0.66
`,
-0.65
蚪
-0.64
POSITIVE LOGITS
meant
0.96
mean
0.91
meant
0.81
MEAN
0.78
referring
0.77
Mean
0.76
bedoeld
0.73
means
0.72
bedo
0.72
dimaksud
0.68
Activations Density 0.247%